DocBook NG: The “Hard Cider” Release

Volume 8, Issue 10; 18 Jan 2005; last modified 08 Oct 2010

This release includes a large suite of small improvements and bug fixes, but the big news is a first experimental DTD version.

This is the eighth release of DocBook NG.

As I mentioned before, one of the significant hurdles in getting to DocBook V5.0 is the ability to produce some sort of DTD. The nature of DTD validation will necessarily mean that the DTD will be less restrictive than the RELAX NG grammar (in other words, it will incorrectly report some documents as valid when they really aren't) but I still think we need to produce one.

I spent most of my plane ride on Monday cleaning up a tool chain that can produce a DTD from the DocBook RELAX NG version, and the “Hard Cider” release includes the results of that tool chain.

The process will need to be documented in some detail, but here's a 10,000 foot summary:

  1. Recursively expand all the patterns in the grammar until all of the references are to patterns that define elements.

  2. Collapse and coalesce redundant groupings. For example, ((a|b)|(c|d))* is the same as (a|b|c|d)*.

The result of these first two steps, by the way, is also the input to the tool that builds the reference documentation.

  1. If a content model includes text, reduce it to (#PCDATA|…)* regardless of it's more complex structure.

  2. If an element appears more than once in a choice, discard one of them.

  3. Handle a few special case that were just too hard to generalize (merging the CALS and HTML table models, for example).

  4. Make a parameter entity for common attributes.

  5. Write the result out in DTD syntax.

The astute will observe that this is a pretty crude solution. A better answer would be to construct a data structure representing the grammar and analyze it carefully, relying on an implicit understanding of the semantics to construct “the right” DTD. If you build that tool, please tell me about it, in the mean time, I think the approach I've outlined above will satisfy our purposes for DocBook V5.0.

In addition to the DTD, there are a number user-visible changes in the “Hard Cider” release:

  • Allow colophon at the end of an article, RFE #1070458.

  • Allow navigation components (index, glossary, etc.) at the end of sections.

  • Allow xml:space (with the value “preserve”) on verbatim environments.

  • Make revnumber optional in revision, RFE #1055480.

  • Added “protocol” to the list of class values on systemitem.

  • Add citation and citetitle to attribution.

  • Added alt and annotation.

  • Added rowheader to table and informaltable.

  • Made title required on preface. It always should have been.

  • Added contractsponsor, contractnum, and mediaobject to the content of info.

  • Allow text where a proper date used to be required (pubdate and friends).

  • Allow endterm on link.

  • Allow refsection as a “start” element.

  • Allow initializer in paramdef.

Most of these changes make DocBook NG more compatible with DocBook V4.x.

Comments

Sounds vaguely like a Schema to DTD converter I wrote a while back, in XSLT. I found that it was possible to convert the content models for mixed content into flat DTD models by some creative use of the 'translate' function, I seem to remember. Hack, hack, hack. Cheers, Tony.

—Posted by Anthony B. Coates on 11 Feb 2005 @ 08:19 UTC #