The point of the XML-based Darwin Information Typing Architecture (DITA) is to create modular technical documents that are easy to reuse with varied display and delivery mechanisms, such as helpsets, manuals, hierarchical summaries for small-screen devices, and so on. This article explains how to put the DITA principles into practice with regards to the creation of a DTD and transforms that will support your particular information types, rather than just using the base DITA set of concept, task, and reference.
Topic specialization is the process by which authors and architects define new topic types, while maintaining compatibility with existing style sheets, transforms, and processes. The new topic types are defined as an extension, or delta, relative to an existing topic type, thereby reducing the work necessary to define and maintain the new type.
The examples used in this paper use XML DTD syntax and XSLT; if you need background on these subjects, see Resources.
In SGML, architectural forms are a classic way to provide mappings from one document type to another. Specialization is an architectural-forms-like solution to a more constrained problem: providing mappings from a more specific topic type to a more general topic type. Because the specific topic type is developed with the general topic type in mind, specialization can ignore many of the thornier problems that architectural forms address. This constrained domain makes specialization processes relatively easy to implement and maintain. Specialization also provides support for multi-level or hierarchical specializations, which allow more general topic types to serve as the common denominator for different specialized types.
The specialization process was created to work with DITA, although its principles and processes apply to other domains as well. This will make more sense if you consider an example: Given specialization and a generic DTD such as HTML, you can create a new document type (call it MyHTML). In MyHTML you could enforce site standards for your company, including specific rules about forms layout, heading levels, and use of font and blink tags. In addition, you could provide more specific structures for product and ordering information, to enable search engines and other applications to use the data more effectively.
Specialization lets MyHTML be defined as an extension of the HTML DTD, declaring new element types only as necessary and referencing HTML's DTD for shared elements. Wherever MyHTML declares a new element, it includes a mapping back to an existing HTML element. This mapping allows the creation of style sheets and transforms for HTML that operate equally well on MyHTML documents. When you want to handle a structure differently (for example, to format product information in a particular way), you can define a new style sheet or transform that holds the extending behavior, and then import the standard style sheet or transform to handle the rest. In other words, new behavior is added as extensions to the original style sheet, in the same way that new constraints were added as extensions to the original DTD or schema.
The Darwin Information Typing Architecture is less about document types than information types. A document is considered to be made up of a number of topics, each with its own information type. A topic is, simply, a chunk of information consisting of a heading and some text, optionally divided into sections. The information type describes the content of the topic: for example, the type of a given topic might be "concept" or "task."
DITA has three types of topic: a generic topic, or information-typed concept, task, and reference topics. Concept, task, and reference topics can all be considered specializations of topic:
Additional information types can be added to the architecture as specializations of any of these three basic types, or as a peer specialization directly off of topic; and any of these additional specializations can in turn be specialized:
Each new information type is defined as an extension of an existing information type: the specializing type inherits, without duplication, any common structures; and the specializing type provides a mapping between its new elements and the general type's existing elements. Each information type is defined in its own DTD module, which defines only the new elements for that type. A document that consists of exactly one information type (for example, a task document in a help web) has a document type defined by all the modules in the information type's specialization hierarchy (for example, task.mod and topic.mod). A document type with multiple information types (for example, a book consisting of concepts, tasks, and reference topics) includes the modules for each of the information types used, as well as the modules for their ancestors (concept.mod, task.mod, reference.mod, plus their ancestor topic.mod).
Because of the separation of information types into modules, you can define new information types without affecting ancestor types. This separation gives you the following benefits:
Consider the specialization hierarchy for a reference topic:
Table 1 expresses the relationship between the general elements in topic and the specific elements in reference. Within the table, the columns, rows, and cells indicate information types, element mappings, and elements. Table 2 explains the relationships in detail to help you interpret Table 1.
Listing 1 illustrates not the actual
<!ELEMENT reference ((%title;), (%prolog;)?, (%refbody;),(%info-types;)* )> <!ELEMENT refbody (%section; | refsyn | %simpletable; | properties)*> <!ELEMENT properties ((%sthead;)?, (%strow;)+) > <!ELEMENT refsyn (%section;)* >
Most of the content models declared here depend on elements or entities declared in
To expose the element mappings, we add an attribute to each element that shows its mappings to more general types.
<!ATTLIST reference class CDATA "- topic/topic reference/reference "> <!ATTLIST refbody class CDATA "- topic/body reference/refbody "> <!ATTLIST properties class CDATA "- topic/simpletable reference/properties "> <!ATTLIST refsyn class CDATA "- topic/section reference/refsyn ">
Later on, we'll talk about how to take advantage of these attributes when you write an XSL transform. See the appendix for a more in-depth description of the class attribute.
Now that we've defined the type module (which declares the newly typed elements and their attributes) and added specialization attributes (which map the new type to its ancestors), we can assemble an authoring DTD.
<!--Redefine the infotype entity to exclude other topic types--> <!ENTITY % info-types "reftopic"> <!--Embed topic to get generic elements --> <!ENTITY % topic-type SYSTEM "topic.mod"> %topic-type; <!--Embed reference to get specific elements --> <!ENTITY % reference-type SYSTEM "reference.mod"> %reference-type;
Now let's create a more specialized information type: API descriptions, which are a kind of (and therefore specialization of) reference topic:
Table 3 shows part of the specialization for an information type called
As before, each cell specializes the contents of the cell to its left:
Here you can see that the content for an API description is actually much more restricted than the content of a general reference topic. The sequence of
<!ELEMENT APIdesc (APIname, (%prolog;)?, APIbody,(%info-types;)* )> <!ELEMENT APIname (%title.cnt;)*> <!ELEMENT APIbody (refsyn,usage,parameters,(%section;)*)> <!ELEMENT usage (%section.notitle.cnt;)* > <!ATTLIST usage spectitle CDATA #FIXED "Usage"> <!ELEMENT parameters ((%sthead;)?, (%strow;)+)>
Every new element now has a mapping to all its ancestor elements.
<!ATTLIST APIdesc class CDATA "- topic/topic reference/reference APIdesc/APIdesc " > <!ATTLIST APIname spec CDATA "- topic/title reference/title APIdesc/APIname " > <!ATTLIST APIbody spec CDATA "- topic/body reference/refbody APIdesc/APIbody" > <!ATTLIST parameters spec CDATA "- topic/simpletable reference/properties APIdesc/parameters "> <!ATTLIST usage spec CDATA "- topic/section reference/section APIdesc/usage ">
Note that
Now that we've defined the type module (which declares the newly typed elements and their attributes) and added specialization attributes (which map the new type to its ancestors), we can assemble an authoring DTD.
<!--Redefine the infotype entity to exclude other topic types--> <!ENTITY % info-types "APIdesc"> <!--Embed topic to get generic elements --> <!ENTITY % topic-type SYSTEM "topic.mod"> %topic-type; <!--Embed reference to get more specific elements --> <!ENTITY % reference-type SYSTEM "reference.mod"> %reftopic-type; <!--Embed APIdesc to get most specific elements --> <!ENTITY % APIdesc-type SYSTEM "APIdesc.mod"> %APIdesc-type;
After a specialized type has been defined the necessary attributes have been declared, they can provide the basis for the following operations:
Because content written in a new information type (such as
To override this default behavior, an author can simply create a new, more specific rule for that element type, and then import the default style sheet or transform, thus extending the behavior without directly editing the original style sheet or transform. This reuse by reference reduces maintenance costs (each site maintains only the rules it uniquely requires) and increases consistency (because the core transform rules can be centrally maintained, and changes to the core rules will be reflected in all other tranforms that import them). Control over reuse has moved from the author of the transform to the reuser of the transform.
The rest of this section assumes knowledge of XSLT, the XSL Transformations language.
This process works only if the general transforms have been enabled to handle specialized elements, and if the specialized elements include enough information for the general transform to handle them.
To provide the specialization information, you need to add specialization attributes, as outlined previously. After you include the attributes in your documents, they are ready to be processed by specialization-aware transforms.
For the transform, you need template rules that check for a match against both the element name and the attribute value.
<xsl:template match="*[contains(@class," topic/simpletable "]"> <!--matches any element that has a class attribute that mentions topic/simpletable--> <!--do something--> </xsl:template>
To override the general transform for a specific element, the author of a new information type can create a transform that declares the new behavior for the specific element and imports the general transform to provide default behavior for the other elements.
For example, an
<xsl:import href="general-transform.xsl"/> <xsl:template match="*[contains(@class," APIdesc/parameters "]"> <!--do something--> <xsl:apply-templates/> </xsl:template>
Both the preexisting
Because a specialized information type is also an instance of its ancestor types (an
To safely generalize a topic, you need a way to map from your information type to the target information type. You also need a way to preserve the original type in case you need round-tripping later.
The
Each level of specialization has its own set of class attributes, which in the end provide the full specialization hierarchy for all specialized elements.
Consider the
<APIdesc> <APIname>AnAPI</APIname> <APIbody> <refsyn>AnAPI (parm1, parm2)</refsyn> <usage spectitle="Usage">Use AnAPI to pass parameters to your process. </usage> <parameters > ... </parameters> </APIbody> </APIdesc>
With the class attributes exposed (all values are provided as defaults by the DTD):
<APIdesc class="- topic/topic reference/reference APIdesc/APIdesc "> <APIname class="- topic/title reference/title APIdesc/APIname ">AnAPI </APIname> <APIbody class="- topic/body reference/refbody APIdesc/APIbody "> <refsyn class="- topic/section reference/refsyn ">AnAPI(parm1, parm2)</refsyn> <usage class="- topic/section reference/section APIdesc/usage " spectitle="Usage"> <p class="- topic/p ">Use AnAPI to pass parameters to your process.</p> </usage> <parameters class="topic/simpletable reference/properties APIdesc/parameters "> ... </parameters> </APIbody> </APIdesc>
From here, a single template rule can transform the entire
After a transform to topic, it should look something like Listing 13:
<topic class="- topic/topic reference/reference APIdesc/APIdesc "> <title class="- topic/title reference/title APIdesc/APIname ">AnAPI </title> <body class="- topic/body reference/refbody APIdesc/APIbody "> <section class="- topic/section reference/refsyn ">AnAPI(parm1, parm2)</section> <section class="- topic/section reference/section APIdesc/usage " spectitle="Usage"> <p class="- topic/p ">Use AnAPI to pass parameters to your process.</p> </section> <simpletable class="topic/simpletable reference/properties APIdesc/parameters "> ... </simpletable> </body> </topic>
Even after generalization, specialization-aware transforms can continue to treat the topic as an
From here, it is possible to round-trip by reversing the transformation (looking in the
However, if anyone changes the structure of the content while it is a generic
It is relatively trivial to specialize a general topic if the content was originally authored as a specialized type. However, a more complex case can result if you have authored content at a general level that you now want to type more precisely.
For example, suppose that you create a set of reference topics. Then, having analyzed your content, you realize that you have a consistent pattern. Now you want to enforce this pattern and describe it with a specialized information type (for example, API descriptions). In order to specialize, you need to first create the target DTD and then add enough information to your content to allow it to be migrated.
You can put the specializing information in either of two places:
In either case, before migration you can run a validation transform that looks for the appropriate attribute, then checks that the content of the element will be valid under the specialized content model. You can use a tool like Schematron to generate both the validating transform and the migrating transform, or you can migrate first and use the specialized DTD to validate that the migration was successful.
Like the XML DTD syntax, the XML Schema language is a way of defining a vocabulary (elements and attributes) and a set of constraints on that vocabulary (such as content models, or fixed vs. implied attributes). It has a built-in specialization mechanism, which includes the capability to restrict allowable specializations. Using the XML Schema language instead of DTDs would make it much easier to validate that specialized information types represent valid subsets of generic types, which ensures smooth processing by generic translation and publishing transforms.
Unlike DTDs, XML schemas are expressed as XML documents. As a result, they can be processed in ways that DTDs cannot. For example, we can maintain a single XML schema and then use XSL to generate two versions:
However, XML schemas are not yet popular enough to adopt wholeheartedly. The main problems are a lack of authoring tools, and incompatibilities between the implementations of an evolving standard. These problems should be remedied by the industry over the next year or so, as the standard is finalized and schemas become more widely adopted and supported.
You can create a specialized information type by using this general procedure:
You can create specialized XSL transforms by using this general procedure:
Although you could create a new element equivalent for any tag in a general DTD, this work is useless to you as an author unless the content models that would include the tag are also specialized. In the
This domino effect can be avoided by using domain specialization. If you truly just want to add some new variant structures to an existing information type, use domain specialization instead of topic specialization (see
To ensure that the specialized elements are more constrained than their general equivalents (that is, that they allow a proper subset of the structures that the general equivalent allows), you need to look at the content model of the general element. You can safely change the content model of your specialized element as shown in Table A:
You have a general element
<!ELEMENT General (a,b?,(c|d+))>
When you specialize
Leaving aside renaming (which is always allowed, and simply means that you are also specializing some of the elements that
<!ELEMENT Special (a,b,(c|d))>
<!ELEMENT Special (a,b?,c)>
<!ELEMENT Special (a,b?,d1,d2,d3)>
Every element must have a class attribute. The class attribute starts and ends with white space, and contains a list of blank-delimited values. Each value has two parts: the first part identifies a topic type, and the second part (after a /) identifies an element type. The class attribute value should be declared as a default attribute value in the DTD. Generally, it should not be modified by the author.
Example:
<appstep class="- topic/li task:step bctask/appstep ">A specialized step</appstep>
When a specialized type declares new elements, it must provide a class attribute for the new element. The class attribute must include a mapping for every topic type in the specialized type's ancestry, even those in which no element renaming occurred. The mapping should start with topic, and finish with the current element type.
Example:
<appname class="- topic/kwd task/kwd bctask/appname ">
This is necessary so that generalizing and specializing transforms can map values simply and accurately. For example, if task/kwd was missing as a value, and I decided to map this bctask up to a task topic, then the transform would have to guess whether to map to kwd (appropriate if task is more general, which it is) or leave as appname (appropriate if task were more specialized, which it isn't). By always providing mappings for more general values, we can then apply the simple rule that missing mappings must by default be to more specialized values, which means the last value in the list is appropriate. While this example is trivial, more complicated hierarchies (say, five levels deep, with renaming occurring at two and four only) make this kind of mapping essential.
A specialized type does not need to change the class attribute for elements that it does not specialize, but simply reuses by reference from more generic levels. For example, since task and bctask use the p element without specializing it, they don't need to declare mappings for it.
A specialized type only declares class attributes for the elements that it uniquely declares. It does not need to declare class attributes for elements that it reuses or inherits.
Applying an XSLT template based on class attribute values allows a transform to be applied to whole branches of element types, instead of just a single element type.
Wherever you would check for element name (any XPath statement that contains an element name value), you need to enhance this to instead check the contents of the element's class attribute. Even if the element is unrecognized, the class attribute can let the transform know that the element belongs to a class of known elements, and can be safely treated according to their rules.
Example:
<xsl:template match="*[contains(@class,' topic/li ')]"> This match statement will work on any li element it encounters. It will also work on step and appstep elements, even though it doesn't know what they are specifically, because the class attribute tells the template what they are generally. <xsl:template match="*[contains(@class,' task/step ')]">
This match statement won't work on generic li elements, but it will work on both step elements and appstep elements; even though it doesn't know what an appstep is, it knows to treat it like a step.
Be sure to include a leading and trailing blank in your class attribute string check. Otherwise you could get false matches (without the blanks, 'task/step' would match on 'notatask/stepaway', when it shouldn't).
When you create a domains specialization, the new elements still need a class attribute, but should start with a "+" instead of a "-". This signals any generalization transforms to treat the element differently: a domains-aware generalization transform may have different logic for handling domains than for handling topic specializations.
Domain specializations should be derived either from topic (the root topic type), or from another domain specialization. Do not create a domain by specializing an already specialized topic type: this can result in unpredictable generalization behavior, and is not currently supported by the architecture.
© Copyright International Business Machines Corp., 2002, 2003. All rights reserved.
The information provided in this document has not been submitted to any formal IBM test and is distributed "AS IS," without warranty of any kind, either express or implied. The use of this information or the implementation of any of these techniques described in this document is the reader's responsibility and depends on the reader's ability to evaluate and integrate them into their operating environment. Readers attempting to adapt these techniques to their own environments do so at their own risk.