<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="pto" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
    
    
<title>DocBook NG: The “Absinthe” Release</title><biblioid class="uri">http://norman.walsh.name/2004/01/01/absinthe</biblioid>
<volumenum>7</volumenum>
<issuenum>1</issuenum>
<pubdate>2004-01-01</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2004</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>I’ve talked about refactoring DocBook before and over the past few
days I’ve tried to pull together a solid implementation of those
ideas. I think the results show a lot of promise.
</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#DocBook"/>
</info>

<epigraph>
<attribution>
      <personname>
<firstname>Steven</firstname>
	<surname>Levy</surname>
</personname>
    </attribution>
<para xml:id="p1">There has never been an unexpectedly short debugging period in
the history of computers.</para>
</epigraph>

<para xml:id="p2">I’ve 
<link xlink:href="http://norman.walsh.name/threads/refactorDocBook">talked about</link>
refactoring DocBook before and over the past few days I’ve tried to pull
together a solid implementation of those ideas. I’ve released it as
<link xlink:href="http://docbook.org/docbook-ng/">DocBook NG: The “Absinthe” Release</link>.
It’s called “DocBook NG” because (a) it’s just my experiment so it’d be a bit
presumptuous for me to call it 5.<replaceable>x</replaceable> and (b) it’s only
available as a <link xlink:href="http://www.relaxng.org/">RELAX NG</link> grammar today. Eventually there will be DTD and W3C XML
Schema versions of it.</para>

<para xml:id="p3">It’s called “Absinthe” because I thought it would be
more fun to name the releases than call them α1, α2, α3, etc. In the spirit
of New Year’s Day and a little “hair of the dog”, the theme is
potent potables. (Deb<indexterm>
<primary>Walsh</primary>
      <secondary>Deborah</secondary>
    </indexterm> actually
suggested the theme, I had something quite lame in mind.)</para>

<para xml:id="p4">The most important point I want to make about DocBook NG is that
I think it is still very much DocBook in spirit. Another point is
that I want you to try it. There’s a stylesheet in the distribution
that will convert DocBook documents into DocBook NG documents.
Convert everything you’ve got, find out what works and what doesn’t,
what should and what shouldn’t, and let me know.</para>

<para xml:id="p5">I published a special
version of <citetitle>DocBook: The Definitive Guide</citetitle> with
<link xlink:href="http://docbook.org/tdg/en/html-ng/part2.html">reference pages</link>
that show the content models of DocBook V4.3CR2 and DocBook NG
“Absinthe” side-by-side.</para>

<para xml:id="p6">Here’s a recap of some of the significant changes:</para>

<section xml:id="tight">
<title>Tightened Constraints</title>

<para xml:id="p7">RELAX NG can express constraints in the grammar that we could
previously only express in the documentation. For example, <tag class="element">biblioid</tag>
has two attributes, <tag class="attribute">class</tag> and
<tag class="attribute">otherclass</tag>, with the semantic that
<tag class="attribute">otherclass</tag> is required if
<literal>class="other"</literal> and forbidden otherwise. In RELAX NG, we can
enforce that constraint.</para>
</section>

<section xml:id="context">
<title>Context Dependent Content Models</title>

<para xml:id="p8">We can also tighten up constraints in content models. Consider
<link xlink:href="http://docbook.org/tdg/en/html-ng/table.html">
	<tag class="element">table</tag>
      </link>.
In DocBook V4.3, we introduced HTML tables alongside CALS tables. To
support this in the DTD, the content model for
<tag class="element">table</tag> had to be constructed so that it was the
union of both models. This allows not only HTML tables and CALS tables
but also hybrid tables that are neither HTML nor CALS. In DocBook NG,
the two definitions are entirely separate: you can have CALS tables or
HTML tables, but nothing in between.</para>
</section>

<section xml:id="linking">
<title>Common Linking Attributes</title>

<para xml:id="p9">DocBook NG attempts to solve the “ubiquitous linking” problem by allowing
either <tag class="attribute">linkend</tag> <emphasis>or</emphasis>
<tag class="attribute">href</tag> on most elements. So <tag class="element">ulink</tag>
doesn’t exist anymore, but you can say:</para>

<screen>I prefer
&lt;command href="/manual/emacs/"&gt;emacs&lt;/command&gt;
for editing my documents</screen>

<para xml:id="p10">Which produces the same effect. Or it would if the stylesheets supported
DocBook NG, which they don’t. Yet.</para>
</section>

<section xml:id="smaller">
<title>Fewer Choices</title>

<para xml:id="p11">A lot of content models in DocBook are too big. This has been true for a long
time and has been the subject of perennial improvement plans. Consider
<tag class="element">citation</tag>:</para>

<screen>citation ::=
(#PCDATA|footnoteref|xref|abbrev|acronym|
 citation|citerefentry|citetitle|emphasis|
 firstterm|foreignphrase|glossterm|
 footnote|phrase|orgname|quote|trademark|
 wordasword|personname|link|olink|ulink|
 action|application|classname|methodname|
 interfacename|exceptionname|ooclass|
 oointerface|ooexception|command|
 computeroutput|database|email|envar|
 errorcode|errorname|errortype|errortext|
 filename|function|guibutton|guiicon|…)*</screen>

<para xml:id="p12">Does it really make sense to have an <tag class="element">citerefentry</tag> in a
citation? Probably not. So in DocBook NG, I made the content model much smaller:</para>

<literallayout><literal>citation ::=</literal>
    • Zero or more of:
          ◦ <literal>anchor</literal>
          ◦ <literal>indexterm</literal> (indexterm.endofrange)
          ◦ <literal>indexterm</literal> (indexterm.singular)
          ◦ <literal>indexterm</literal> (indexterm.startofrange)
          ◦ <literal>inlinemediaobject</literal>
          ◦ <literal>phrase</literal> (text.phrase)
          ◦ <literal>remark</literal>
          ◦ <literal>replaceable</literal></literallayout>

<para xml:id="p13">Too small? Perhaps, but that should turn up pretty quickly in
testing.</para>
</section>

<section xml:id="info">
<title>Info Elements</title>

<para xml:id="p14">There’s a single <tag class="element">info</tag> wrapper now (instead of
<tag class="element">bookinfo</tag>, <tag class="element">chapterinfo</tag>, etc.).
The <tag class="element">info</tag> wrapper also occurs in several more places
and comes in three flavors to establish greater consistency. Consider
<tag class="element">procedure</tag>:</para>

<literallayout><literal>procedure ::=</literal>
    • Sequence of:
          ◦ One of:
                ▪ Sequence of:
                      ▪ Interleave of:
                            ▪ <literal>title</literal>?
                            ▪ <literal>titleabbrev</literal>?
                      ▪ <literal>info</literal>? (db.info.titleforbidden)
                ▪ <literal>info</literal> (db.info.titleonly)
          ◦ Zero or more of:
                ▪ <literal>address</literal>
                ▪ <literal>anchor</literal>
                ▪ …
          ◦ One or more of:
                ▪ <literal>step</literal></literallayout>

<para xml:id="p15">What does this say? Working from the inside out, it says that a
procedure can optionally have a <tag class="element">title</tag> and
<tag class="element">titleabbrev</tag> in any order (but at most once)
followed by an <tag class="element">info</tag> that forbids titles or it can
have an <tag class="element">info</tag> that allows titles. After the title markup,
it can have optional blocks followed by at least one
<tag class="element">step</tag>.</para>

</section>

<section xml:id="exclusions">
<title>Exclusions, Sort Of</title>

<para xml:id="p16">There’s a real tension in schema design between simplicity and consistency
on the one hand and rigerous enforcement of every possible constraint on the
other.</para>

<para xml:id="p17">For example, the DocBook NG schema includes a pattern called “blocks” that
contains all the block level elements. Most content models that include block
elements do so by reference to that pattern. That’s simple and consistent.
But consider admonitions; admontions are not allowed to nest. So we have three
choices:</para>

<orderedlist>
<listitem>
	<para xml:id="p18">We could enforce this constraint by adjusting the content model
of each admonition (<tag class="element">note</tag>, <tag class="element">caution</tag>, etc.)
so that it did not include the other admonition elements. Instead of using
“blocks”, we might use “list.blocks | para.blocks | verbatim.blocks | …”. (but
explictly <emphasis>not</emphasis> “admonition.blocks”.</para>
<para xml:id="p19">At first glance, that seems to do the trick. But wait,
paragraphs can include blocks so although this would exclude
admonitions from appearing directly inside admonitions, it would still
allow admonitions insided paragraphs inside admonitions.</para>
<para xml:id="p20">There’s no question that RELAX NG is powerful enough to express the constraint
we want directly, but it would require multiple definitions for almost all of the
possible descendants of admonitions. And similar constraints in other places would
quickly result in a combinatorial explosion of patterns. So that won’t work.</para>
</listitem>
<listitem>
	<para xml:id="p21">We could not enforce the constraint, or only “enforce” it in the
documentation or in other tools.</para>
</listitem>
<listitem>
	<para xml:id="p22">Or we could take advantage of another schema technology, such as
Schematron.</para>
</listitem>
</orderedlist>

<para xml:id="p23">Direct support for
<link xlink:href="http://www.ascc.net/xml/resource/schematron/schematron.html">Schematron</link>
validation inside tools like
<link xlink:href="http://msv.dev.java.net/">
	<application>msv</application>
      </link>
makes this a very attractive option.
So the DocBook NG patterns for admonitions include Schematron rules that enforce
exclusion constraints. Sweet.</para>

<para xml:id="p24">Now, in point of fact, the DocBook NG schema is built from
sources that I “compile” into the actual schema. So all I actually
have to do in the DocBook NG source to setup an exclusion is add
an annotation:</para>

<screen>ctrl:exclude [ from="admonition.blocks"
               exclude="admonition.blocks" ]</screen>

<para xml:id="p25">The sources and the build system are available from the
<link xlink:href="http://sourceforge.net/projects/docbook/">DocBook project</link>
on
<link xlink:href="http://sourceforge.net/">SourceForge</link>. They are very
experimental and I’m reasonably confident that they’ll change in significant
ways before all is said and done.</para>

<para xml:id="p26">Another nice feature of this build strategy is that I’ll be able to produce
a Schematron schema that expresses many of the constraints in DocBook NG that
can’t be expressed in DTDs. So when there’s a DTD version, which will be necessity
be much more liberal than the RELAX NG version, there will also be a Schematron
schema to use as an adjunct, if you wish.</para>

</section>

<section xml:id="discarded">
<title>Discarded Elements</title>

<para xml:id="p27">A little housecleaning was definitely in order. Some elements have been
replaced by a single element in several flavors others have simply been discarded.
A few of these may be controversial. They can always be added again.</para>

<para xml:id="p28">All the flavors of info have been replaced by the single <tag class="element">info</tag>
element in several flavors.</para>

<para xml:id="p29">Several of the linking elements are gone in favor of common linking attributes:
<tag class="element">link</tag>, <tag class="element">olink</tag>, and <tag class="element">ulink</tag>.</para>

<para xml:id="p30">DocBook NG has recursive <tag class="element">section</tag> and
<tag class="element">refsection</tag> elements, but it no longer has
<tag class="element">sect<replaceable>n</replaceable></tag> or
<tag class="element">refsect<replaceable>n</replaceable></tag> elements.</para>

<para xml:id="p31">All of the <tag class="element">lot</tag>/<tag class="element">toc</tag> machinery has
been simplified.</para>

<para xml:id="p32">Some other elements have been tossed onto the scrap heap:
<tag class="element">action</tag>,
<tag class="element">alt</tag>,
<tag class="element">authorblurb</tag> (use <tag class="element">personblurb</tag>),
<tag class="element">beginpage</tag>,
<tag class="element">collabname</tag>,
<tag class="element">corpauthor</tag>,
<tag class="element">corpcredit</tag>,
<tag class="element">corpname</tag> (use <tag class="element">orgname</tag>),
<tag class="element">graphic</tag>,
<tag class="element">graphicco</tag>,
<tag class="element">inlinegraphic</tag> (use flavors of <tag class="element">mediaobject</tag>),
<tag class="element">interface</tag>,
<tag class="element">invpartnumber</tag>,
<tag class="element">isbn</tag>,
<tag class="element">issn</tag> (use flavors of <tag class="element">biblioid</tag>),
<tag class="element">medialabel</tag>,
<tag class="element">modespec</tag>,
<tag class="element">property</tag>,
<tag class="element">pubsnumber</tag>,
<tag class="element">tag</tag> (renamed <tag class="element">xmltag</tag>),
<tag class="element">structfield</tag>, and
<tag class="element">structname</tag>.</para>
</section>

<section xml:id="namespace">
<title>No Namespace</title>
<para xml:id="p33">I haven’t put DocBook NG in a namespace, but I still think it should be in
one. I’ve figured out how to handle this dicotomy in the XSL stylesheets (though
it won’t be terribly efficient), but I’m worried about other tools.</para>
<para xml:id="p34">Maybe in the next release.</para>
</section>

</essay>

