More Ruminations on DocBook
Some ideas about what a refactored DocBook might look like, and a prototype.
It doesn't seem quite fair to suggest scrapping DocBook without at least considering what should replace it. And rather than waiting until I'm finished, I think it probably makes sense to publish what I've cooked up. It's maybe three-quarters finished, maybe a little more. In any event, it's just one guy's idea.
In general terms, my changes fall into four categories: rationalize the content model of inlines, normalize the metadata, discard cruft, and make changes that appear (to me) to simplify things.
Rationalizing Inlines
I've divided inlines into three classes: ubiquitous inlines (ones that should be available everywhere), general inlines, and domain-specific inlines.
In trying to find a design principle to discriminate between what should go in the content model of a particular inline and what should not, I eventually settled on a simple one: any given inline contains just text or it contains every inline. In my prototype, a lot of inlines contain just text.
Just Text
Given that there are some ubiquitous elements, what does “just text” mean? It means the following:
ubiq.inlines = db.inlinemediaobject
| db.anchor
| db.indexterm
| db.remark
docbook.text = text | ubiq.inlines
| text.phrase | text.replaceable
Anywhere that character data is allowed, so is
inlinemediaobject
(because it's the traditional
DocBook way of allowing special characters; less necessary in XML but still
valuable enough in legacy terms to justify inclusion), anchor
,
indexterm
, remark
, and special forms
of phrase
and replaceable
.
What's special about phrase
and
replaceable
in this context is that they contain
“just text”. In contexts where all inlines are allowed,
they're allowed inside phrase
and
replaceable
too.
Normalizing Metadata
DocBook has a dozen or more flavors of metadata wrapper
(bookinfo
, chapterinfo
,
sidebareinfo
, etc.). It has all these flavors because
DTDs only allow one content model
per element name and we wanted to provide some way for customizers to
require or restrict metadata in different contexts.
RELAX NG removes the restriction that there can only be one
content model per element name and allows us to replace all of these
multifarious elements with a single wrapper:
info
.
Out-of-the-box, info
comes in three flavors: with a required
title, with an optional title, and with titles forbidden. The grammar is arranged
so
that customizers who need or want to add more flavors can easily do so, without adding
more element names.
I've also taken the liberty of enforcing two additional rules:
title
,
titleabbrev
, and
subtitle
must appear first (and in that order) if they're
allowed or required, and they may appear only once.
And titles are only allowed inside info
.
You can't have them outside anymore.
Discarding Cruft
Some stuff just has to go. I have no doubt that for every element in DocBook, there's a user somewhere. But I believe experience suggests that some of them are not worth the complexity they carry.
My list of candidates for deletion:
-
msgset
. And perhaps more controversiallysimplemsgset
. -
graphic
,inlinegraphic
,graphicco
. -
sgmltag
. Replaced byxmltag
. -
authorblurb
. Replaced bypersonblurb
. -
toc
andlot
. Replaced by much simplertoc
markup. -
caption
. Maybe we should allow captions on figures, but allowing them onmediaobject
is clunky. -
modespec
,invpartnumber
,pubsnumber
,isbn
, andissn
(usebiblioid
),structname
,structfield
,medialabel
,interface
,action
,property
,otheraddr
,contractnum
,contractsponsor
,corpauthor
(author
now allows either apersonname
or anorgname
),corpname
(replaced byorgname
),beginpage
(good riddance!),ackno
,alt
, andcollabname
. -
Also
segmentedlist
. -
link
,olink
,ulink
. Replaced by ubiquitous linking. Every element can have either alinkend
attribute or anhref
attribute. -
Enumerated section elements (
sect1
,sect2
,refsect1
, etc.). Again, these exist because there was no other way to limit recursive depth in DTDs. In RELAX NG, you can do it without forcing the author to think about the element names.
Too aggressive, perhaps. Or not aggressive enough. Certainly not a finished, final list.
Miscellany
Finally, I've made some organizational changes. Some of these are documented as future use changes in V4.0, some are not.
In no particular order:
-
The components of a personal name (
firstname
,surname
, etc.) are no longer allowed free-standing. You have to wrap them in apersonname
. -
I've explicitly allowed both CALS and HTML table models. RELAX NG lets us segregate them so there's no overlap: it's exactly one or exactly the other. Perhaps HTML tables should (also or only?) be allowed in the XHTML namespace?
-
I removed the
format
attribute from verbatim environments. -
I dropped the
class
attribute fromproductname
. -
I made
title
mandatory onequation
. -
I removed the
srccredit
attribute fromimagedata
and friends. Those elements now allowinfo
and the credit can more properly go there. -
I removed
contrib
, useothercredit
instead.
A Prototype
All this work resulted in a prototype and a stylesheet that converts (some) DocBook V4.2 documents to conform to the prototype.
One important change that I haven't made (yet) is putting DocBook in a namespace. But we should.