<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="lillet" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
    
<title>DITA for DocBook</title><biblioid class="uri">http://norman.walsh.name/2005/10/21/dita</biblioid>
<volumenum>8</volumenum>
<issuenum>136</issuenum>
<pubdate>2005-10-21T12:51:25-04:00</pubdate>
<date>$Date: 2005-10-21 13:59:54 -0400 (Fri, 21 Oct 2005) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2005</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>Implementing the Darwin Information Typing Architecture for DocBook.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#DITA"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#DocBook"/>
</info>

<para xml:id="p1">I've been trying to get my head around <link xlink:href="http://en.wikipedia.org/wiki/DITA">DITA</link> for a while
now. The trouble is, DocBook isn't my day job, and the DITA spec is
fairly hefty, so it's taken rather longer than I would have liked.
I've also been struggling with an emotional impediment: DocBook has
never really had any competition before and I don't relish the thought
fighting with anyone about it. But
like it or not, DocBook and DITA are competitors, at least to the
extent that both are aimed at the technical documentation market. At
the end of the day, though, the markup vocabulary you choose is your
business and I don't suffer if you choose not to use DocBook.</para>

<para xml:id="p2">That said, I think I've got my head around DITA now
and if you line DocBook and DITA up, I think DITA can point to four
technical differences that are arguably features in its favor:</para>

<orderedlist>
<listitem>
      <para xml:id="p3">A topic-oriented authoring paradigm.
</para>
    </listitem>
<listitem>
      <para xml:id="p4">A cross-referencing scheme that's more practical than
XML's flat ID space.
</para>
    </listitem>
<listitem>
      <para xml:id="p5">SGML's conref, reinvented.
</para>
    </listitem>
<listitem>
      <para xml:id="p6">An extensibility model based on “specialization”.
</para>
    </listitem>
</orderedlist>

<para xml:id="p7">Well, heck, if that's all DITA has going for it,
DocBook can do those things. :-)</para>

<section xml:id="topics">
<title>Topics</title>

<para xml:id="p8">DocBook's legacy is certainly big, linear documents:
it even has the word “book” in it's name. But there's
<emphasis>nothing</emphasis> that prevents you from writing modern,
topic-oriented, highly modular documentation in DocBook. Nothing
except, perhaps, the emotional weight of the tag names. “Book”,
“Chapter”, “Section” all sound like monolithic, linear structures.
Even “Article” feels a little bit like ink on dead trees.</para>

<para xml:id="p9">Fine. I can invent a “Topic” element to fix that:</para>

<programlisting>dita.topic =
   element topic {
      dita.topic.attlist,
      dita.topic.info,
      db.all.blocks*,
      db.section*,
   }</programlisting>

<para xml:id="p10">The <tag>topic</tag> element is only half the
story, though. DITA also has a complicated system for combining topics
together based on map files. A map file identifies the topics that are
part of a given deliverable (set of web pages, help system, etc., even
a book).</para>

<para xml:id="p11">No problemo. I can have map files too:</para>

<programlisting>
dita.map =
   element map {
      dita.map.attlist,
      dita.map.info,
      dita.topicref+
   }

dita.topicref =
   element topicref {
      dita.topicref.attlist,
      dita.topicref*
   }</programlisting>

<para xml:id="p12">There's a bit more to DITA map files than
<tag>topicref</tag>s, but I think that's the most significant part.
Other parts, such as the mechanism for tabulating relationships
between topics, are equally easy to construct.
</para>
</section>

<section xml:id="xrefs">
<title>Cross References</title>

<para xml:id="p13">XML IDs are required to be globally unique. In a
system for reusable, modular documentation, that can be a real drag.
Even assuming you can manage globally unique IDs across a large number
of independent topic files, reuse can break the flat ID space.</para>

<para xml:id="p14">Consider a unit of content that you might want to
reuse, a <tag>note</tag> or <tag>table</tag> or something. If it has
an ID, and if you pull that element into several different topics and
those topics get pulled together by the map file, you're guaranteed to
have the same ID appearing several times in your final, combined set
of topics.</para>

<para xml:id="p15">The DITA solution is clever: scoped IDs. Given that the topic
is the unit of documentation, I can say that <tag>topic</tag>s must
have globally unique IDs, but that every other element will be
referenced within the scope of its containing topic. This is
accomplished by inventing a fragment identifier syntax. Consider this
topic:</para>

<programlisting>&lt;topic xml:id="topic1"&gt;
&lt;info&gt;
&lt;title&gt;Example Topic 1&lt;/title&gt;
&lt;/info&gt;
&lt;para&gt;Some topic content.&lt;/para&gt;
&lt;note xml:id="usefulnote"&gt;
&lt;para&gt;This note isn't really useful, but pretend it is.
&lt;/para&gt;
&lt;/note&gt;
&lt;/topic&gt;</programlisting>

<para xml:id="p18">The ID/IDREF way of referring to that note would be
with its ID value: <code>&lt;link linkend="usefulnote"&gt;this
note&lt;/link&gt;</code>. But that's ambiguous if the <tag>note</tag>
appears in more than one topic, so instead I use:
<code>&lt;link xlink:href="#topic1/usefulnote"&gt;this
note&lt;/link&gt;</code>. The semantics of this fragment identifier syntax
are straightforward: find the second ID (<literal>usefulnote</literal>)
inside the topic with the first ID (<literal>topic1</literal>).
Then if I say that this is the fragment identifier syntax for
documents in this system (i.e. with some media type that I still have
to invent), I've closed the loop (web-) architecturally.</para>

<para xml:id="p19">Now, in theory, I've still got the technical
problem that I have multiple <tag class="attribute">xml:id</tag>
attributes with the same value in the combined set of topics. I could
only avoid this by using a different attribute name. But I actually
think it's better to ignore this theoretical problem. In practice,
what this means is that the validator will check the uniqueness of IDs
as long as I validate individual topics. That's going to catch
cut-and-paste errors, and I think that's worth bending the rules slightly
at build time.</para>

<para xml:id="p20">I can implement this system by adjusting the
stylesheets to understand these fragment identifiers and by turning
off ID/IDREF linking:</para>

<programlisting>db.linkend.attribute = notAllowed
db.linkends.attribute = notAllowed
db.endterm.attribute = notAllowed</programlisting>

<para xml:id="p21">There. That was easy.</para>
</section>

<section xml:id="conref">
<title>Conref</title>

<para xml:id="p22">Conceptually, “conref” (or <emphasis>content
reference</emphasis>) is a kind of cross reference. But instead of
pointing to its content, it
<link xlink:href="http://en.wikipedia.org/wiki/Transclusion">transcludes</link>
it. The practical benefit of
conref is that it replaces some uses of entities or
<link xlink:href="http://en.wikipedia.org/wiki/XInclude">XInclude</link>.
DITA's reinvention of conref has a couple of interesting
features:</para>

<itemizedlist>
<listitem>
	<para xml:id="p23">It transcludes the content of the element
it points to, but not the element itself. This means you can reuse an
element without reusing it's ID or other attribute values.</para>
</listitem>
<listitem>
	<para xml:id="p24">A conref must point to an element of the
same type. In other words, you can conref from one <tag>para</tag> to
another, but not from a <tag>para</tag> to a <tag>note</tag>.</para>
</listitem>
</itemizedlist>

<para xml:id="p25">Consider the useful note from above. If I wanted to
reuse it in a new topic, how would I do it? I could put it in an
entity and reference it in both places, or I could use XInclude. But
neither of these would have the features above, so instead, I use a
new <tag class="attribute">conref</tag> attribute:</para>

<programlisting>&lt;note conref="#topic1/usefulnote"/&gt;</programlisting>

<para xml:id="p26">That's easy to add to DocBook:</para>

<programlisting>db.common.attributes &amp;= dita.conref.attribute?
db.common.idreq.attributes &amp;= dita.conref.attribute?</programlisting>

<para xml:id="p27">An additional semantic of conref is that an element
with a <tag class="attribute">conref</tag> attribute must be empty.
Although
<link xlink:href="http://en.wikipedia.org/wiki/RELAX_NG">RELAX NG</link>
could be persuaded to enforce that constraint, it seems
tedious to do so for a common attribute. Instead, I'll eventually rely on
<link xlink:href="http://en.wikipedia.org/wiki/Schematron">Schematron</link>
assertions to test for that (I haven't written them just yet). In the
meantime, I've made the stylesheet that performs the transclusion
enforce that constraint.</para>
</section>

<section xml:id="specialization">
<title>Specialization</title>

<para xml:id="p28">DITA's extensibility mechanism is perhaps its most
clever invention. While it's easy to extend DocBook, for example, to add a new
element, doing so introduces an interoperability problem.</para>

<para xml:id="p29">Suppose you invent a new kind of list, a product
list. Imagine that the important semantic of a product list on
your system is that products named in a product list are automatically
verified against a manifest. In all other respects, it's
just a regular ordered list.</para>

<para xml:id="p30">The DocBook way to do this in a portable manner is
with the <tag class="attribute">role</tag> attribute:</para>

<programlisting>&lt;orderedlist role="productlist"&gt;
&lt;listitem&gt;&lt;para&gt;1 &lt;productname&gt;oscillation overthruster&lt;/productname&gt;
&lt;/para&gt;&lt;/listitem&gt;
&lt;listitem&gt;&lt;para&gt;4 &lt;productname&gt;#11 screws&lt;/productname&gt;
&lt;/para&gt;&lt;/listitem&gt;
&lt;listitem&gt;&lt;para&gt;1 &lt;productname&gt;watermelon&lt;/productname&gt;
&lt;/para&gt;&lt;/listitem&gt;
&lt;/orderedlist&gt;</programlisting>

<para xml:id="p34">The problem is, if the reason you're inventing the
new kind of element is to give it a slightly different content model,
this approach doesn't really work. (In fact, you can make it work in
RELAX NG, but it'd be really ugly for authors.)</para>

<para xml:id="p35">What you'd like to do instead is just invent a new
tag, <tag>productlist</tag>, and use that:</para>

<programlisting>&lt;productlist&gt;
&lt;listitem&gt;&lt;para&gt;1 &lt;productname&gt;oscillation overthruster&lt;/productname&gt;
&lt;/para&gt;&lt;/listitem&gt;
&lt;listitem&gt;&lt;para&gt;4 &lt;productname&gt;#11 screws&lt;/productname&gt;
&lt;/para&gt;&lt;/listitem&gt;
&lt;listitem&gt;&lt;para&gt;1 &lt;productname&gt;watermelon&lt;/productname&gt;
&lt;/para&gt;&lt;/listitem&gt;
&lt;/productlist&gt;</programlisting>

<para xml:id="p39"><emphasis>Now</emphasis> the problem is, if you
want to format that element, you have to modify the stylesheets and if
you want to interchange your topics with others, they all have
to have your stylesheet customizations too.</para>

<para xml:id="p40">DITA overcomes this by describing extensions in
terms of
<link xlink:href="http://docs.oasis-open.org/dita/v1.0/archspec/ditaspecialization.html">specialization</link> or subtyping. When you invent a new element,
you also say what kind of element it specializes. When the stylesheets
(or other tools) don't know what to do with your special element, they
can automatically treat it as if it was the more general element that
it specializes.</para>

<para xml:id="p41">The DITA mechanism for accomplishing this is an
ingenious, if elaborate, system of fixed attribute values in the DTD.
This leads to odd looking stylesheets that almost exclusively use
patterns of the form:</para>

<programlisting>&lt;xsl:template match="*[contains(@class, '<replaceable>some/value</replaceable> ')]"&gt;
  ...
&lt;/xsl:template&gt;</programlisting>

<para xml:id="p42">In addition to a sort of baroque scheme for
implementing this in DTDs, DITA also appears to have the limitation
that specializations must be isomorphic to something in the base
system. That, in turn, forces some of the elements in the base system
to have…interesting content models.</para>

<para xml:id="p43">Consider DITA's <tag>topic</tag> for example. The
content model of a topic body is
“<code>(p|note|...|section|example)*</code>”. On the face of it, that
allows topics to contain a free mixture of sections and paragraphs,
which one wouldn't ordinarily consider a good thing. I gather that
this is necessary so that some specialization of <tag>topic</tag> can
have an element that's required to occur last (after
<tag>section</tag>), that is itself a specialization of <tag>p</tag>.
But I could be wrong about that.</para>

<para xml:id="p44">Anyway, the idea of specialization is useful and
interesting, and I can accomplish the same thing on top of DocBook
by taking advantage of annotations in the schema.
In broad strokes:</para>

<orderedlist>
<listitem>
	<para xml:id="p45">I add annotations to the RELAX NG grammar
for the extensions. These annotations describe how to transform each
new element back to some base element in DocBook.
</para>
      </listitem>
<listitem>
	<para xml:id="p46">I add a parameter to the stylesheets so
that they can know what schema is being used for the document.
This is conceptually no different than the DITA case where
the DTD for the extension is, in practice, required.
</para>
      </listitem>
<listitem>
	<para xml:id="p47">The stylesheets already have a
“normalization” phase that adjusts content in the source document; I
extended that phase to include handling “unknown” elements by mapping
them back to DocBook as described by the annotations.
</para>
      </listitem>
</orderedlist>

<para xml:id="p48">So all you have to do is add the annotation to your
extension:</para>

<programlisting>dita.productlist =
   [
      r:remap [ db:orderedlist [] ]
   ]
   element productlist {
      dita.productlist.attlist,
      dita.productlist.info,
      db.all.blocks*,
      db.listitem+
   }</programlisting>

<para xml:id="p49">And you're done. A <tag>productlist</tag> will be
treated exactly like an <tag>orderedlist</tag>.</para>

<para xml:id="p50">Although I didn't show it above, this technique is
used in the definition of <tag>topic</tag> to map topics back to
sections. And for my DITA experiment, where I created DocBook
<tag>task</tag>, <tag>concept</tag>, and <tag>reference</tag>
specializations of <tag>topic</tag>, I used exactly the same
technique. A <tag>task</tag> is remapped to a <tag>topic</tag> if
there's no template for <tag>task</tag>, which is, in turn, remapped
to a <tag>section</tag>, if there's no template for
<tag>topic</tag>.</para>

<para xml:id="p51">Using the annotation technique, there's no
requirement that extensions be isomorphic to something already in
DocBook, though that's the simplest case. Consider the DITA
<tag>relatedlinks</tag> tag that can occur at the end of a topic.
Suppose you wanted to turn this list of links into a section with a
default title? You can use a slightly more complicated remap annotation
to accomplish that:
</para>

<programlisting>dita.relatedlinks =
   [
      r:remap [
      db:section [
         role="dita-relatedlinks"
         db:info [
            db:title [ "Related Links" ]
         ]
         db:para [
            r:content []
         ]
      ]
      ]
   ]
   element relatedlinks {
      dita.relatedlinks.attlist,
      dita.relatedlinks.info,
      db.link.inlines+
   }</programlisting>

<para xml:id="p63">That annotation will wrap the body of the
<tag>relatedlinks</tag> element inside a <tag>para</tag> inside a
<tag>section</tag> with the
<tag>title</tag> “Related Links”.</para>

<para xml:id="p52">The extent of the transformations that you can do
today is fairly limited (isomorphism or wrapping the content in some
structure). But if I imagine a world in the not too distant future
where there's a standard XML Pipeline language for processing a
sequence of transformations, it's easy to imagine that XSLT templates
could be used as annotations, giving extension writers almost complete
freedom.</para>
</section>

<section xml:id="whatsleft">
<title>What's Left?</title>

<para xml:id="p53">With a couple of hours of hacking, I've implemented
on top of DocBook the four key features of DITA that I could identify.
(If there are more, bring them on, DocBook can do them too!)
In doing so, I've attempted to remain true to the spirit of DocBook,
so my content models aren't exactly the same as the DITA models, but I
think the analogies are sound.</para>

<para xml:id="p54">That means the choice of which vocabulary to use,
DocBook or DITA, comes down
simply to the actual terms in the vocabulary, the elements and
attributes provided, their semantics, and their relationships to each
other. On that score, I think DocBook is the hands-down winner.</para>

<para xml:id="p55">But I was bound to say that, wasn't I?</para>
</section>

<section xml:id="thesource">
<title>The Source</title>

<para xml:id="p56">My experiment to implement DITA on top of DocBook includes:</para>

<variablelist>
<varlistentry>
<term>A schema (<link xlink:href="examples/dita4db.rng">RNG</link> or
<link xlink:href="examples/dita4db.rnc">RNC</link>)</term>
<listitem>
	  <para xml:id="p57">The schema is a DocBook 5.0 extension
that defines a new top-level element, the <tag>topic</tag>. In the
interest of modelling DITA, it also defines a <tag>task</tag> with the
same general structure as a DITA task, a <tag>concept</tag>, and a
<tag>reference</tag> as specializations of <tag>topic</tag>.</para>

<para xml:id="p58">I really don't understand the structure of a DITA
<tag>task</tag> with its body elements that are just like paragraphs.
How a technical vocabulary could expect every task to have
pre- and post-requisites, a context, a result, and (a single!) example
such that each
fits into a single paragraph is beyond me. If there's ever a move to
standardize my DITA customizations of DocBook, I think <tag>task</tag>
can be done better. (There's also the issue of the existing,
distinct <tag>task</tag> element already in DocBook, but that's a different
problem.)

</para>
	</listitem>
</varlistentry>
<varlistentry>
<term>A <link xlink:href="examples/dita4db.xsl">stylesheet</link></term>
<listitem>
	  <para xml:id="p59">The stylesheet is a customization of the
DocBook XSLT2 Stylesheets. It handles the semantics of the simple map
files I outlined above, supports <tag class="attribute">conref</tag>,
and implements the DITA fragment identifier syntax. I incorporated the
schema support into the base stylesheets.</para>
	</listitem>
</varlistentry>
<varlistentry>
<term>An example</term>
<listitem>
	  <para xml:id="p60">My example is just a toy, but it has
several parts: a
<link xlink:href="examples/map.xml">map</link>, a
<link xlink:href="examples/topic1.xml">“main” topic</link>, a
<link xlink:href="examples/topic2.xml">“subordinate” topic</link>, and a
<link xlink:href="examples/task1.xml">task</link>.</para>
<para xml:id="p61">Run them through the stylesheets and you get a
<link xlink:href="examples/normalized.xml">“normalized” document</link>
which is formatted
<link xlink:href="examples/example.html">as you'd expect</link>.
</para>
	</listitem>
</varlistentry>
</variablelist>

<para xml:id="p62">Of all the pieces involved, supporting a more
robust map file is probably the most interesting. But it wouldn't be
difficult.</para>

</section>
</essay>

