Some ideas about what a refactored DocBook might look like, and a prototype.
It doesn't seem quite fair to suggest scrapping DocBook without at least considering what should replace it. And rather than waiting until I'm finished, I think it probably makes sense to publish what I've cooked up. It's maybe three-quarters finished, maybe a little more. In any event, it's just one guy's idea.
In general terms, my changes fall into four categories: rationalize the content model of inlines, normalize the metadata, discard cruft, and make changes that appear (to me) to simplify things.
I've divided inlines into three classes: ubiquitous inlines (ones that should be available everywhere), general inlines, and domain-specific inlines.
In trying to find a design principle to discriminate between what should go in the content model of a particular inline and what should not, I eventually settled on a simple one: any given inline contains just text or it contains every inline. In my prototype, a lot of inlines contain just text.
Given that there are some ubiquitous elements, what does “just text” mean? It means the following:
1ubiq.inlines = db.inlinemediaobject 2 | db.anchor 3 | db.indexterm 4 | db.remark 5docbook.text = text | ubiq.inlines 6 | text.phrase | text.replaceable
Anywhere that character data is allowed, so is
<inlinemediaobject> (because it's the traditional
DocBook way of allowing special characters; less necessary in XML but still
valuable enough in legacy terms to justify inclusion),
<remark>, and special forms
What's special about
<replaceable> in this context is that they contain
“just text”. In contexts where all inlines are allowed,
they're allowed inside
DocBook has a dozen or more flavors of metadata wrapper
<sidebareinfo>, etc.). It has all these flavors because
DTDs only allow one content model
per element name and we wanted to provide some way for customizers to
require or restrict metadata in different contexts.
RELAX NG removes the restriction that there can only be one
content model per element name and allows us to replace all of these
multifarious elements with a single wrapper:
<info> comes in three flavors: with a required
title, with an optional title, and with titles forbidden. The grammar is arranged so
that customizers who need or want to add more flavors can easily do so, without adding
more element names.
I've also taken the liberty of enforcing two additional rules:
<subtitle> must appear first (and in that order) if they're
allowed or required, and they may appear only once.
And titles are only allowed inside
You can't have them outside anymore.
Some stuff just has to go. I have no doubt that for every element in DocBook, there's a user somewhere. But I believe experience suggests that some of them are not worth the complexity they carry.
My list of candidates for deletion:
<msgset>. And perhaps more controversially
<sgmltag>. Replaced by
<authorblurb>. Replaced by
<lot>. Replaced by much simpler
<caption>. Maybe we should allow captions on figures, but allowing them on
<author>now allows either a
<ulink>. Replaced by ubiquitous linking. Every element can have either a
linkendattribute or an
Enumerated section elements (
<refsect1>, etc.). Again, these exist because there was no other way to limit recursive depth in DTDs. In RELAX NG, you can do it without forcing the author to think about the element names.
Too aggressive, perhaps. Or not aggressive enough. Certainly not a finished, final list.
Finally, I've made some organizational changes. Some of these are documented as future use changes in V4.0, some are not.
In no particular order:
The components of a personal name (
<surname>, etc.) are no longer allowed free-standing. You have to wrap them in a
I've explicitly allowed both CALS and HTML table models. RELAX NG lets us segregate them so there's no overlap: it's exactly one or exactly the other. Perhaps HTML tables should (also or only?) be allowed in the XHTML namespace?
I removed the
formatattribute from verbatim environments.
I dropped the
I removed the
<imagedata>and friends. Those elements now allow
<info>and the credit can more properly go there.
All this work resulted in a prototype and a stylesheet that converts (some) DocBook V4.2 documents to conform to the prototype.
One important change that I haven't made (yet) is putting DocBook in a namespace. But we should.