DITA-OT pre-processing architecture

This topic describes the set of steps commonly known as the pre-processing stage of a DITA build. These steps typically run at the start of any build using the DITA-OT, regardless of the final output format.

Each step described corresponds to one Ant target in the build pipeline. The general Ant target "preprocess" will call all of the targets described here.

Generate lists (gen-list)

The gen-list step examines the input files and creates lists of topics, images, document properties, or other content. These lists are used by later steps in the pipeline. For example, one list includes all topics that make use of the conref attribute; only those files are processed during the conref stage of the build. This step is implemented in Ant and Java.

The result of this list is a set of several list files in the temporary directory, including dita.list and dita.xml.properties.

List file property List file List property Usage
canditopicsfile canditopics.list canditopicslist  
chunkedditamapfile chunkedditamap.list chunkedditamaplist  
chunkedtopicfile chunkedtopic.list chunkedtopiclist  
codereffile coderef.list codereflist topics with coderef
conreffile conref.list conreflist Documents that contains conref attribute that need to be resolved in preprocess.
conrefpushfile conrefpush.list conrefpushlist  
conreftargetsfile conreftargets.list conreftargetslist  
copytosourcefile copytosource.list copytosourcelist  
copytotarget2sourcemapfile copytotarget2sourcemap.list copytotarget2sourcemaplist  
flagimagefile flagimage.list flagimagelist  
fullditamapandtopicfile fullditamapandtopic.list fullditamapandtopiclist All of the ditamap and topic files that are referenced during the transformation. These may be referenced by href or conref attributes.
fullditamapfile fullditamap.list fullditamaplist All of the ditamap files in dita.list
fullditatopicfile fullditatopic.list fullditatopiclist All of the topic files in dita.list
hrefditatopicfile hrefditatopic.list hrefditatopiclist All of the topic files that are referenced with an href attribute
hreftargetsfile hreftargets.list hreftargetslist link targets
htmlfile html.list htmllist resource files
imagefile image.list imagelist Images files that are referenced in the content
keyreffile keyref.list keyreflist Topics and maps which have key references.
outditafilesfile outditafiles.list outditafileslist  
relflagimagefile relflagimage.list relflagimagelist  
resourceonlyfile resourceonly.list resourceonlylist  
skipchunkfile skipchunk.list skipchunklist  
subjectschemefile subjectscheme.list subjectschemelist  
subtargetsfile subtargets.list subtargetslist  
tempdirToinputmapdir.relative.value      
uplevels      
user.input.dir     Absolute input directory path
user.input.file.listfile     Input file list file
user.input.file     Input file path, relative to input directory

Debug and filter (debug-filter)

The debug-filter stage processes all referenced DITA content, and creates copies in a temporary directory for use during the remainder of the build. Several modifications are made during this process.

As the files are copied, the following modifications are made:

  • The files are filtered according to entries in any specified DITAVAL file.
  • Debug information is inserted into each element (using the xtrf and xtrc attributes). These values allow messages later in the build to reliably indicate the original source of the error — for example, a message may trace back to the fifth <ph> element in a specific source document. Without these attributes, that count may no longer be available due to filtering and other processing.
  • Adjust column names in tables to use a common naming scheme. This is done only to simplify later conref processing; for example, if a table row is pulled into another table, this ensures that a reference to "column 5 properties" will continue to work in the fifth column of the new table.

This step is implemented in Java.

Copy related files (copy-files)

The copy-files step copies related non-DITA resources to the output directory, such as HTML files referenced in a map or images referenced by DITAVAL files.

Conref push (conrefpush)

The conrefpush step resolves "conref push" references. The conref push feature was added in the DITA 1.2 specification, and the associated processing is available in DITA-OT version 1.5 and later. This step only processes documents that use conref push (or that are updated due to the push action). The step is implemented in Java.

Conref (conref)

The conref step resolves traditional conref attributes, processing only the documents that use the conref attribute. Each map or topic is processed with XSLT to resolve the attributes.

As part of the process, IDs within referenced content are changed as they are pulled into the new location. This is done in order to ensure that IDs within the original (referencing) topic remain unique.

If an element with an ID is pulled into a new context along with a cross reference that references the target, both the ID and the reference are updated so that they remain valid in the new location. For example, a referenced topic may include a section as in the following example.

<topic id="referenced_topic">
  <title>...</title>
  <body>
    <section id="sect"><title>Sample section</title>
      <p>Look at the next figure <xref href="#referenced_topic/fig">here</xref>.</p>
      <fig id="fig"><title>Sample</title>
        <p>This is a rather useless figure, but it
           illustrates a point.</p>
      </fig>
    </section>
  </body>
</topic>

If the section is referenced with a conref attribute, the ID on the <fig> element will be modified to ensure it remains unique inside the new topic. At the same time, the <xref> element will also be modified so that after the conref is resolved, it remains valid as a local reference. If the topic pulling in a new copy of the section has the id "new_topic", then the pulled copy of the section may look something like this in the intermediate document.

<section><title>Sample section</title>
  <p>Look at the next figure <xref href="#new_topic/d1e25">here</xref>.</p>
  <fig id="d1e25"><title>Sample</title>
    <p>This is a rather useless figure, but it
       illustrates a point.</p>
  </fig>
</section>

In this case, the ID of the figure has been changed to a generated value of "d1e25". At the same time, the <xref> element has been updated to use that new generated ID, so that the reference stays local in the updated topic.

Move metadata (move-meta-entries)

The move-meta-entries step pushes metadata back and forth between maps and topics. For example, index entries and copyrights in the map are pushed into affected topics, so that topics may be processed later in isolation while retaining all relevant metadata.
This step is implemented in Java.

Resolve keyref (keyref)

The keyref step examines all keys defined in the source material, and updates key references appropriately. Links that make use of keys are updated so that any href value is replaced by the appropriate target; key based text replacement is also evaluated. The keyref mechanism was defined as part of the DITA 1.2 standard, and is available in DITA-OT 1.5 and later.

This step is implemented in Java.

Resolve code references (codref)

The coderef module resolves references made with the <coderef> element, which was added in DITA 1.2. This module is available in DITA-OT 1.5 and later.

The <coderef> element is used inside of <codeblock> to reference code stored externally in non-XML documents. During the pre-process step, this Java module pulls the referenced content into the <codeblock> element.

Resolve map references (mapref)

The mapref module resolves references from one map to another.

Maps may reference other maps using markup similar to the following:

<topicref href="other.ditamap" format="ditamap"/>

The DITA 1.2 standard added a new element that allows this sort of reference without setting the format attribute:

<mapref href="other.ditamap"/>

In either case, the element that references the other map is replaced by the topic references from the other map. Relationship tables are pulled into the referencing map as a child of the root element (<map> or a specialization of <map>).

This step is implemented in XSLT.

Pull content into maps (mappull)

The mappull step pulls content from referenced topics into maps, and cascades data within maps.

This step uses XSLT to make the following changes to the map:

  • Pull titles from referenced DITA topics. This step replaces the navigation title specified on the topicref. If the locktitle attribute is set to "yes", the value in the map is not changed.
  • The <linktext> element is set based on the title of the referenced topic, unless it is already specified locally.
  • The <shortdesc> element is set based on the short description of the referenced topic, unless it is already specified locally.
  • When a local DITA topic is referenced, the type attribute is set on the topicref based on the type of topic referenced. For example, a reference to a task topic will end up with type="task".
  • Inheritable attributes, such as toc or print, are made explicit on child topicref elements. This allows any future step to work with the attributes directly, without reevaluating the cascade behavior.

Chunk topics (chunk)

The chunk step is a Java module that breaks apart and assembles referenced DITA content based on the chunk attribute in maps.

The following values are recognized on the chunk attribute, based on definitions provided in the DITA specification. These values were initially defined in the DITA 1.1 specification, with significant clarifications in the DITA 1.2 specification.

  • select-topic
  • select-document
  • select-branch
  • by-topic
  • by-document
  • to-content
  • to-navigation.

Map based linking (maplink and move-links)

These two steps work together to create links based on a map and move those links into referenced topics. The links are created based on hierarchy (parent/child), the collection-type attribute (sequential or family links), and relationship tables.

The maplink module first runs an XSLT program that evaluates the map, and places all generated links into a single file in the temporary processing directory. Once that file is created, the move-links module runs a Java program that pushes the generated links into the proper topics.

Pull content into topics (topicpull)

The topicpull module pulls content into <xref> and <link> elements (if needed).

For <xref> elements, if the <xref> does not contain link text, the target is examined and link text is pulled. For example, a reference to a topic will pull the title of the topic; a reference to a list item will pull the number of the item. If the <xref> element references a topic that has a short description, and the <xref> element does not already contain a child <desc> element, a <desc> element is created with the short description of the target.

The process is similar for <link> elements. If the <link> does not have a child <linktext> element, one is created with the appropriate link text. Similarly, if the <link> element does not have a child <desc> element, and the short description of the target can be determined, a <desc> is created with the short description of the target.

This step is implemented in XSLT.

Flagging in the toolkit

Beginning with DITA-OT 1.7, flagging support is implemented as a common preprocess module. The module evaluates the DITAVAL against all flagging attributes, and adds DITA-OT specific hints in to the topic when flags are active. Any extended transform type may use these hints to support flagging without adding logic to interpret the DITAVAL.

Evaluating the DITAVAL flags

Flagging is implemented as a reusable module during the preprocess stage. If a DITAVAL file is not used with a build, this step is skipped with no change to the file.

When a flag is active, relevant sections of the DITAVAL itself are copied into the topic as a sub-element of the current topic. The active flags are enclosed in a pseudo-specialization of the <foreign> element (referred to as a pseudo-specialization because it is used only under the covers, with all topic types; it is not integrated into any shipped document types).

<ditaval-startprop>
When any flag is active on an element, a <ditaval-startprop> element will be created as the first child of the flagged element:
<ditaval-startprop class="+ topic/foreign ditaot-d/ditaval-startprop ">

The <ditaval-startprop> element will contain the following:

  • If the active flags should create a new style, that style is included using standard CSS markup on the @outputclass attribute. Output types that make use of CSS, such as XHTML, can use this value as-is.
  • If styles conflict, and a <style-conflict> element exists in the DITAVAL, it will be copied as a child of <ditaval-startprop>.
  • Any <prop> or <revprop> elements that define active flags will be copied in as children of the <ditaval-startprop> element. Any <startflag> children of the properties will be included, but <endflag> children will not.

<ditaval-endprop>
When any flag is active on an element, a <ditaval-endprop> element will be created as the last child of the flagged element:
<ditaval-endprop class="+ topic/foreign ditaot-d/ditaval-endprop ">

CSS values and <styleconflict> elements are not included on this element.

Any <prop> or <revprop> elements that define active flags will be copied in as children of <ditaval-prop>. Any <endflag> children of the properties will be included, but <startflag> children will not.

Supporting flags in overrides or custom transform types

For most transform types, the <foreign> element should be ignored by default, because arbitrary non-DITA content may not mix well unless coded for ahead of time. If the <foreign> element is ignored by default, or if a rule is added to specifically ignore <ditaval-startprop> and <ditaval-endprop>, then the added elements will have no impact on a transform. If desired, flagging support may be integrated at any time in the future.

The processing described above runs as part of the common preprocess, so any transform that uses the default preprocess will get the topic updates. To support generating flags as images, XSLT based transforms can use default fallthrough processing in most cases. For example, if a paragraph is flagged, the first child of <p> will contain the start flag information; adding a rule to handle images in <ditaval-startprop> will cause the image to appear at the start of the paragraph content.

In some cases fallthrough processing will not result in valid output; for those cases, the flags must be explicitly processed. This is done in the XHTML transform for elements like <ol>, because fallthrough processing would place images in between <ol> and <li>. To handle this, the code processes <ditaval-startprop> before starting the element, and <ditaval-endprop> at the end. Fallthrough processing is then disabled for those elements as children of <ol>.

Example DITAVAL

Assume the following DITAVAL file is in use during a build. This DITAVAL will be used for each of the following content examples.

<?xml version="1.0" encoding="UTF-8"?>
<val>
  <!-- Define what happens in the case of conflicting styles -->
  <style-conflict background-conflict-color="red"/>

  <!-- Define two flagging properties that give styles (no image) -->
  <prop action="flag" att="audience" style="underline" val="user" backcolor="green"/>
  <prop action="flag" att="platform" style="overline" val="win" backcolor="blue"/>

  <!-- Define a property that includes start and end image flags -->
  <prop action="flag" att="platform" val="linux" style="overline" backcolor="blue">
    <startflag imageref="startlin.png"><alt-text>Start linux</alt-text></startflag>
    <endflag imageref="endlin.png"><alt-text>End linux</alt-text></endflag>
  </prop>

  <!-- Define a revision that includes start and end image flags -->
  <revprop action="flag" style="double-underline" val="rev2">
    <startflag imageref="start_rev.gif"><alt-text>ssssssssssstart</alt-text></startflag>
    <endflag imageref="end_rev.gif"><alt-text>eeeeeeeeeeeeeend</alt-text></endflag>
  </revprop>
</val>

Content example 1: adding style

Now assume the following paragraph exists in a topic. Class attributes are included, as they would normally be in the middle of the preprocess routine; @xtrf and @xtrc are left off for clarity.

<p audience="user">Simple user; includes style but no images</p>

Based on the DITAVAL above, audience="user" results in a style with underlining and with a green background. The interpreted CSS value is added to @outputclass on <ditaval-startprop>, and the actual property definition is included at the start and end of the element. The output from the flagging step looks like this (with newlines added for clarity, and class attributes added as they would appear in the temporary file):

The resulting file after the flagging step looks like this; for clarity, newlines are added, while @xtrf and @xtrc are removed:

<p audience="user" class="- topic/p ">
  <ditaval-startprop class="+ topic/foreign ditaot-d/ditaval-startprop " 
           outputclass="background-color:green;text-decoration:underline;">
    <prop action="flag" att="audience" style="underline" val="user" backcolor="green"/>
  </ditaval-startprop>
  Simple user; includes style but no images
  <ditaval-endprop class="+ topic/foreign ditaot-d/ditaval-endprop ">
    <prop action="flag" att="audience" style="underline" val="user" backcolor="green"/>
  </ditaval-endprop>
</p>

Content example 2: conflicting styles

This example includes a paragraph with conflicting styles. When the audience and platform attributes are both evaluated, the DITAVAL indicates that the background color is both green and blue. In this situation, the <style-conflict> element is evaluated to determine how to style the content.

<p audience="user" platform="win">Conflicting styles (still no images)</p>

The <style-conflict> element results in a background color of red, so this value is added to @outputclass on <ditaval-startprop>. As above, active properties are copied into the generated elements; the <style-conflict> element itself is also copied into the generated <ditaval-startprop> element.

The resulting file after the flagging step looks like this; for clarity, newlines are added, while @xtrf and @xtrc are removed:

<p audience="user" platform="win" class="- topic/p ">
  <ditaval-startprop class="+ topic/foreign ditaot-d/ditaval-startprop " 
           outputclass="background-color:red;">
    <style-conflict background-conflict-color="red"/>
    <prop action="flag" att="audience" style="underline" val="user" backcolor="green"/>
    <prop action="flag" att="platform" style="overline" val="win" backcolor="blue"/>
  </ditaval-startprop>
  Conflicting styles (still no images)
  <ditaval-endprop class="+ topic/foreign ditaot-d/ditaval-endprop ">
    <prop action="flag" att="platform" style="overline" val="win" backcolor="blue"/>
    <prop action="flag" att="audience" style="underline" val="user" backcolor="green"/>
  </ditaval-endprop>
</p>

Content example 3: adding image flags

This example includes image flags for both @platform and @rev, which are defined in DITAVAL <prop> and <revprop> elements.

<ol platform="linux" rev="rev2">
  <li>Generate images for platform="linux" and rev="2"</li>
</ol>

As above, the <ditaval-startprop> and <ditaval-endprop> nest the active property definitions, with the calculated CSS value on @outputclass. The <ditaval-startprop> drops the ending image, and <ditaval-endprop> drops the starting image. To make document-order processing more consistent, property flags are always included before revisions in <ditaval-startprop>, and the order is reversed for <ditaval-endprop>.

The resulting file after the flagging step looks like this; for clarity, newlines are added, while @xtrf and @xtrc are removed:

<ol platform="linux" rev="rev2" class="- topic/ol ">
  <ditaval-startprop class="+ topic/foreign ditaot-d/ditaval-startprop " 
           outputclass="background-color:blue;text-decoration:underline;text-decoration:overline;">
    <prop action="flag" att="platform" val="linux" style="overline" backcolor="blue">
      <startflag imageref="startlin.png"><alt-text>Start linux</alt-text></startflag>
    </prop>
    <revprop action="flag" style="double-underline" val="rev2">
      <startflag imageref="start_rev.gif"><alt-text>ssssssssssstart</alt-text></startflag>
    </revprop>
  </ditaval-startprop>
  <li class="- topic/li ">Generate images for platform="linux" and rev="2"</li>
  <ditaval-endprop class="+ topic/foreign ditaot-d/ditaval-endprop ">
    <revprop action="flag" style="double-underline" val="rev2">
      <endflag imageref="end_rev.gif"><alt-text>eeeeeeeeeeeeeend</alt-text></endflag>
    </revprop>
    <prop action="flag" att="platform" val="linux" style="overline" backcolor="blue">
      <endflag imageref="endlin.png"><alt-text>End linux</alt-text></endflag>
    </prop>
  </ditaval-endprop>
</ol>

XHTML migration for flagging updates in DITA-OT 1.7

This topic is primarily of interest to developers with XHTML transform overrides written prior to DITA-OT 1.7. Due to significant changes in the flagging process with the 1.7 release, some changes may be needed to make overrides work properly with DITAVAL based flagging. The new design is significantly simpler than the old design; in many cases, migration will consist of deleting old code that is no longer needed.

Which XHTML overrides need to migrate?

If your override does not contain any code related to DITAVAL flagging, then there is nothing to migrate.

If your builds do not make use of DITAVAL based flagging, but calls the deprecated flagging templates, then you should override but there is little urgency. You will not see any difference in the output, but those templates will be removed in a future release.

If you do make use of DITAVAL based flagging, try using your override with 1.7. Check the elements you override:

  1. In some cases flags may be doubled. This will be the case if you call routines such as "start-flagit".
  2. In some cases flags may be removed. This will be the case if you call shortcut routines such as "revtext" or "revblock".
  3. In other cases, flags may still appear properly, in which case migration is less urgent

For any migration that needs migration, please see the instructions that follow.

Deprecated templates in DITA-OT 1.7

All of the old DITAVAL based templates are deprecated in DITA-OT 1.7. If your overrides include any of the following templates, they should be migrated for the new release; in many cases the templates below will not have any effect on your output, but all instances should be migrated.

  • The "gen-style" template used to add CSS styling
  • The "start-flagit" and "end-flagit" templates used to generate image flags based on property attributes like @audience
  • The "start-revflag" and "end-revflag" templates, used to generate images for active revisions
  • Shortcut templates that group these templates into a single call, such as:
    • "start-flags-and-rev" and "end-flags-and-rev", used to combine flags and revisions into one call
    • "revblock" and "revtext", both used to output start revisions, element content, and end revisions
    • The modes "outputContentsWithFlags" and "outputContentsWithFlagsAndStyle", both used to combine processing for property/revision flags with content processing
  • All other templates that make use of the $flagrules variable, which is no longer used in any of the DITA-OT 1.7 code
  • All templates within flag.xsl that were called from the templates listed above
  • Element processing handled with mode="elementname-fmt", such as mode="ul-fmt" for processing unordered lists and mode="section-fmt" for sections.

What replaces the templates?

The new flagging design described in the preprocess design section now adds literal copies of relevant DITAVAL elements, along with CSS based flagging information, into the relevant section of the topic. This allows most flags to be processed in document order; in addition, there is never a need to read the DITAVAL, interpret CSS, or evaluate flagging logic. The htmlflag.xsl file contains a few rules to match and process the start/end flags; in most cases, all code to explicitly process flags can be deleted.

For example, the common logic for most element rules before DITA-OT 1.7 could be boiled down to the following:

Match element
    Create "flagrules" variable by reading DITAVAL for active flags
    Output start tag such as <div> or <span>
        Call "commonattributes" and ID processing
        Call "gen-style" with $flagrules, to create DITAVAL based CSS
        Call "start-flagit" with $flagrules, to create start flag images
        Call "start-revflag" with $flagrules, to create start revision images
        Output contents
        Call "end-revflag" with $flagrules, to create end revision images
        Call "end-flagit" with $flagrules, to create end flag images
    Output end tag such as </div> or </span>

In DITA-OT 1.7, style and images are typically handled with XSLT fallthrough processing. This removes virtually all special flag coding from element rules, because flags are already part of the document and processed in document order. The sample above is reduced to:

Match element
   Output start tag such as <div> or <span>
      Call "commonattributes" and ID processing
      Output contents
   Output end tag such as </div> or </span>

Migrating "gen-style" named template

Calls to the "gen-style" template should be deleted. There is no need to replace this call for most elements.

The "gen-style" template was designed to read a DITAVAL file, find active style-based flagging (such as colored or bold text), and add it to the generated @style attribute in HTML.

With DITA-OT 1.7, the style is calculated in the pre-process flagging module. The result is created as @outputclass on a <ditaval-startprop> sub-element. The "commonattributes" template now includes a line to process that value; the result is that for every element that calls "commonattributes", DITAVAL style will be processed when needed. Because virtually every element includes a call to this common template, there is little chance that your override needs to explicitly process the style. The new line in "commonattributes" that handles the style is:

<xsl:apply-templates select="*[contains(@class,' ditaot-d/ditaval-startprop ')]/@outputclass" mode="add-ditaval-style"/>

Migrating "start-flagit", "start-revflag", "end-flagit", and "end-flagit" named templates

Calls to these templates fall into two general groups.

If the flow of your element rule is to create a start tag like <div>, "start-flagit"/"start-revflag", process contents, "end-revflag"/"end-flagit", end tag - you just need to delete the calls to these templates. Flags will be generated simply by processing the element contents in document order.

If the flow of your element rule processes flags outside of the normal document-order. There are generally two reasons this is done. The first case is for elements like <ol>, where flags must appear before the <ol> in order to create valid XHTML. The second is for elements like <section>, where start flags are created, followed by the title or some generated text, element contents, and finally end flags. In either of these cases, support for processing flags in document order is disabled, so they must be explicitly processed out-of-line. This is done with the following two lines (one for start flag/revision, one for end flag/revision):

Create starting flag and revision images:
<xsl:apply-templates select="*[contains(@class,' ditaot-d/ditaval-startprop ')]" mode="out-of-line"/>

Create ending flag and revision images:
<xsl:apply-templates select="*[contains(@class,' ditaot-d/ditaval-endprop ')]" mode="out-of-line"/>

For example, the following lines are used in DITA-OT 1.7 to process the <ul> element (replacing the 29 lines used in DITA-OT 1.6):

<xsl:template match="*[contains(@class,' topic/ul ')]">
  <xsl:apply-templates select="*[contains(@class,' ditaot-d/ditaval-startprop ')]" mode="out-of-line"/>
  <xsl:call-template name="setaname"/>
  <ul>
    <xsl:call-template name="commonattributes"/>
    <xsl:apply-templates select="@compact"/>
    <xsl:call-template name="setid"/>
    <xsl:apply-templates/>
  </ul>
  <xsl:apply-templates select="*[contains(@class,' ditaot-d/ditaval-endprop ')]" mode="out-of-line"/>
  <xsl:value-of select="$newline"/>
</xsl:template>

Migrating "start-flags-and-rev" and "end-flags-and-rev"

  • "start-flags-and-rev" is equivalent to calling "start-flagit" followed by "start-revflag"; it should be migrated as in the previous section.
  • "end-flags-and-rev" is equivalent to calling "end-revflag" followed by "end-flagit"; it should be migrated as in the previous section.

Migrating "revblock" and "revtext"

Calls to these two templates can be replaced with a simple call to <xsl:apply-templates/>.

Migrating modes "outputContentsWithFlags" and "outputContentsWithFlagsAndStyle"

Processing an element with either of these modes can be replaced with a simple call to <xsl:apply-templates/>.

Migrating mode="elementname-fmt"

Prior to DITA-OT 1.7, many elements were processed with the following logic:

Match element
    Set variable to determine if revisions are active and $DRAFT is on
    If active
        create division with rev style
            process element with mode="elementname-fmt"
        end division
    Else
        process element with mode="elementname-fmt"

Match element with mode="elementname-fmt"
    Process as needed

Beginning with DITA-OT 1.7, styling from revisions is handled automatically with the "commonattributes" template. This means there is no need for the extra testing, or the indirection to mode="elementname-fmt". These templates are deprecated, and element processing will move into the main element rule. Overrides that include this indirection may remove it; overrides should also be sure to match the default rule, rather than matching with mode="elementname-fmt".

Was this helpful?