Reconfigurable RELAX NG Grammars
RELAX NG is the future for DocBook. But getting a working RELAX NG grammar is only a small part of the battle. We also need to satisfy the requirements of a reasonable evolution path for DocBook. It's going to be a challenge, but a fun one, I think!
This essay is part of my ongoing exploration of what a refactored DocBook might look like. As I've said before, these are my thoughts as I hold them today. They're nobody else's and I reserve the right to change my mind later.
I'm convinced that RELAX NG is the future for DocBook. And building a RELAX NG grammar for DocBook isn't hard. But just having the resulting grammar isn't going to satisfy the requirements as I see them. I want to build a system that will also support the following requirements:
-
It must be possible to generate DTDs and XML Schemas from the RELAX NG grammar.
-
I want to take advantage of the additional expressive power of RELAX NG to more accurately reflect the intended semantics of DocBook. For example, there are several elements that have a pair of attributes, like these:
attribute class { "doi" | "isbn" | ... | "other" } attribute otherclass { text }
The intended semantic is that if “
other
” is chosen for theclass
attribute, the appropriate other value should be given in theotherclass
attribute. With RELAX NG, it's possible to express these co-constraints and I wish to do so. There's an even more interesting case surrounding the use of titles either inside or outside the appropriate info wrapper.
It's pretty obvious that these two goals are in direct conflict with each other. There's no way to express the semantics of the latter example in DTD or W3C XML Schemas. Trang does a nice job of generalizing to make DTDs and XML Schemas out of RELAX NG grammars, but I'm not sure it can practically be expected to unwind my intentions in cases like these.
That means arranging for an automated transformation system that can produce an at least mostly deterministic RELAX NG grammar for some generalization of DocBook that we can hand to Trang.
Another requirement I want to achieve has to do with the ease of use of DocBook as a system, not particularly of the grammar.
-
I want users to be able to mix-and-match subsets and supersets with relative ease. In the current DTD, there are all sorts of parameter entities that allow one to make subsets or extensions, but that configurability isn't exposed to the casual user.
Quick! Write a subset of DocBook that doesn't have any trace of
msgset
orrefentry
s. Or, quick!, add a new kind of admonition to DocBook calledalert
so that it has the right content model and can appear everywhere that the existing admonitions are allowed.
RELAX NG already has facilities for grammar extension and grammar redefinition, but I'm looking for something even easier. (Maybe that's a mistake, maybe I shouldn't be going here.)
Frankly, I want something more like the
TEI Pizza Chef. Users
should be able to say, “I want DocBook with HTML Tables but
without CALS Tables or MsgSet
and its ilk or
callouts.” For some set of predefined modules, they should just
be able to push a button and get a subset like that.
Naturally, whatever system we use for this has to be robust enough to allow a more experienced user to configure additional modules this way.
I think it would be possible to take this idea too far, I don't
see any value in building a system that allows you to select each
individual element. There are just too many variations that won't be
useful. Do you really need to be able to trivially
select programlistingco
without also getting
screenco
? I don't think so.
Some proto-version of my initial exploration of these ideas is starting
to take shape. There's no distribution for it yet (and please don't ask, it's
just too early), but you can get it
from CVS. It's in the
docbook/relaxng
directory, if you already have the
DocBook repository checked out.
I'll write up some notes about how it works tomorrow.
Fair warning: you'll need Make, Saxon (or your EXSLT-aware XSLT processor of choice), Trang, and (optionally) Perl installed and ready to go.
In a nutshell: author in the compact syntax without a bunch of DocBook idioms that can be machine generated (like the role and common attributes for every element), add annotations to describe complex structures that we know will need to be rendered very differently for deterministic languages, add annotations for easy modularity, convert our modules to the XML syntax, combine the requested modules together, and massage lightly to build the final grammar.