DocBook NG: The “Absinthe” Release

Volume 7, Issue 1; 01 Jan 2004

I’ve talked about refactoring DocBook before and over the past few days I’ve tried to pull together a solid implementation of those ideas. I think the results show a lot of promise.

There has never been an unexpectedly short debugging period in the history of computers.

—Steven Levy

I’ve talked about refactoring DocBook before and over the past few days I’ve tried to pull together a solid implementation of those ideas. I’ve released it as DocBook NG: The “Absinthe” Release. It’s called “DocBook NG” because (a) it’s just my experiment so it’d be a bit presumptuous for me to call it 5.x and (b) it’s only available as a RELAX NG grammar today. Eventually there will be DTD and W3C XML Schema versions of it.

It’s called “Absinthe” because I thought it would be more fun to name the releases than call them α1, α2, α3, etc. In the spirit of New Year’s Day and a little “hair of the dog”, the theme is potent potables. (Deb actually suggested the theme, I had something quite lame in mind.)

The most important point I want to make about DocBook NG is that I think it is still very much DocBook in spirit. Another point is that I want you to try it. There’s a stylesheet in the distribution that will convert DocBook documents into DocBook NG documents. Convert everything you’ve got, find out what works and what doesn’t, what should and what shouldn’t, and let me know.

I published a special version of DocBook: The Definitive Guide with reference pages that show the content models of DocBook V4.3CR2 and DocBook NG “Absinthe” side-by-side.

Here’s a recap of some of the significant changes:

Tightened Constraints

RELAX NG can express constraints in the grammar that we could previously only express in the documentation. For example, biblioid has two attributes, class and otherclass, with the semantic that otherclass is required if class="other" and forbidden otherwise. In RELAX NG, we can enforce that constraint.

Context Dependent Content Models

We can also tighten up constraints in content models. Consider table . In DocBook V4.3, we introduced HTML tables alongside CALS tables. To support this in the DTD, the content model for table had to be constructed so that it was the union of both models. This allows not only HTML tables and CALS tables but also hybrid tables that are neither HTML nor CALS. In DocBook NG, the two definitions are entirely separate: you can have CALS tables or HTML tables, but nothing in between.

Common Linking Attributes

DocBook NG attempts to solve the “ubiquitous linking” problem by allowing either linkend or href on most elements. So ulink doesn’t exist anymore, but you can say:

I prefer
<command href="/manual/emacs/">emacs</command>
for editing my documents

Which produces the same effect. Or it would if the stylesheets supported DocBook NG, which they don’t. Yet.

Fewer Choices

A lot of content models in DocBook are too big. This has been true for a long time and has been the subject of perennial improvement plans. Consider citation:

citation ::=
(#PCDATA|footnoteref|xref|abbrev|acronym|
 citation|citerefentry|citetitle|emphasis|
 firstterm|foreignphrase|glossterm|
 footnote|phrase|orgname|quote|trademark|
 wordasword|personname|link|olink|ulink|
 action|application|classname|methodname|
 interfacename|exceptionname|ooclass|
 oointerface|ooexception|command|
 computeroutput|database|email|envar|
 errorcode|errorname|errortype|errortext|
 filename|function|guibutton|guiicon|…)*

Does it really make sense to have an citerefentry in a citation? Probably not. So in DocBook NG, I made the content model much smaller:

citation ::=
    • Zero or more of:
          ◦ anchor
          ◦ indexterm (indexterm.endofrange)
          ◦ indexterm (indexterm.singular)
          ◦ indexterm (indexterm.startofrange)
          ◦ inlinemediaobject
          ◦ phrase (text.phrase)
          ◦ remark
          ◦ replaceable

Too small? Perhaps, but that should turn up pretty quickly in testing.

Info Elements

There’s a single info wrapper now (instead of bookinfo, chapterinfo, etc.). The info wrapper also occurs in several more places and comes in three flavors to establish greater consistency. Consider procedure:

procedure ::=
    • Sequence of:
          ◦ One of:
                ▪ Sequence of:
                      ▪ Interleave of:
                            ▪ title?
                            ▪ titleabbrev?
                      ▪ info? (db.info.titleforbidden)
                ▪ info (db.info.titleonly)
          ◦ Zero or more of:
                ▪ address
                ▪ anchor
                ▪ …
          ◦ One or more of:
                ▪ step

What does this say? Working from the inside out, it says that a procedure can optionally have a title and titleabbrev in any order (but at most once) followed by an info that forbids titles or it can have an info that allows titles. After the title markup, it can have optional blocks followed by at least one step.

Exclusions, Sort Of

There’s a real tension in schema design between simplicity and consistency on the one hand and rigerous enforcement of every possible constraint on the other.

For example, the DocBook NG schema includes a pattern called “blocks” that contains all the block level elements. Most content models that include block elements do so by reference to that pattern. That’s simple and consistent. But consider admonitions; admontions are not allowed to nest. So we have three choices:

We could enforce this constraint by adjusting the content model of each admonition (note, caution, etc.) so that it did not include the other admonition elements. Instead of using “blocks”, we might use “list.blocks | para.blocks | verbatim.blocks | …”. (but explictly not “admonition.blocks”.

At first glance, that seems to do the trick. But wait, paragraphs can include blocks so although this would exclude admonitions from appearing directly inside admonitions, it would still allow admonitions insided paragraphs inside admonitions.

There’s no question that RELAX NG is powerful enough to express the constraint we want directly, but it would require multiple definitions for almost all of the possible descendants of admonitions. And similar constraints in other places would quickly result in a combinatorial explosion of patterns. So that won’t work.
We could not enforce the constraint, or only “enforce” it in the documentation or in other tools.
Or we could take advantage of another schema technology, such as Schematron.

Direct support for Schematron validation inside tools like msv makes this a very attractive option. So the DocBook NG patterns for admonitions include Schematron rules that enforce exclusion constraints. Sweet.

Now, in point of fact, the DocBook NG schema is built from sources that I “compile” into the actual schema. So all I actually have to do in the DocBook NG source to setup an exclusion is add an annotation:

ctrl:exclude [ from="admonition.blocks"
               exclude="admonition.blocks" ]

The sources and the build system are available from the DocBook project on SourceForge. They are very experimental and I’m reasonably confident that they’ll change in significant ways before all is said and done.

Another nice feature of this build strategy is that I’ll be able to produce a Schematron schema that expresses many of the constraints in DocBook NG that can’t be expressed in DTDs. So when there’s a DTD version, which will be necessity be much more liberal than the RELAX NG version, there will also be a Schematron schema to use as an adjunct, if you wish.

Discarded Elements

A little housecleaning was definitely in order. Some elements have been replaced by a single element in several flavors others have simply been discarded. A few of these may be controversial. They can always be added again.

All the flavors of info have been replaced by the single info element in several flavors.

Several of the linking elements are gone in favor of common linking attributes: link, olink, and ulink.

DocBook NG has recursive section and refsection elements, but it no longer has sectn or refsectn elements.

All of the lot/toc machinery has been simplified.

Some other elements have been tossed onto the scrap heap: action, alt, authorblurb (use personblurb), beginpage, collabname, corpauthor, corpcredit, corpname (use orgname), graphic, graphicco, inlinegraphic (use flavors of mediaobject), interface, invpartnumber, isbn, issn (use flavors of biblioid), medialabel, modespec, property, pubsnumber, tag (renamed xmltag), structfield, and structname.

No Namespace

I haven’t put DocBook NG in a namespace, but I still think it should be in one. I’ve figured out how to handle this dicotomy in the XSL stylesheets (though it won’t be terribly efficient), but I’m worried about other tools.

Maybe in the next release.

Comments

I read your introduction to DocBook NG with interest.

Re: Exclusions, Sort Of

How do you feel about this thread?

http://lists.oasis-open.org/archives/relax-ng/200310/msg00004.html

I remember some discussion about modularizing docbook. Do you still think about factoring out domain-specific vocabularies (classsynopsis et al., say) into extensions / profiles ? If so, that would allow other (new) domains to be added for which support has been very limitted (I'm thinking of the ubiquitous 'role' attribute). Domains such as software engineering / development process ('requirement', 'use case', 'principle', etc.), modeling extensions (the aforementioned 'classsynopsis' et al.), etc. etc.

Stefan asks about modularization. Yes, the plan is definitely to allow for modularization. The sources that build the RNG schema are already modularized (don't want glossaries; don't include that module), and the built schema can be customized by redefining patterns.

I'll put together an example or two eventually.

Hi Norm,

I hope that I will have time to look at the DocBook NG in much more detail later. But from first reading of your blog message I see one probmel -- xmltag element name doesn't conform to XML-Rec. See my previous message on that

http://sources.redhat.com/ml/docbook/2003-10/msg00058.html

You're absolutely right. I'm going to just use "tag" for the moment, I think.