Reconfigurable RELAX NG Grammars

Volume 6, Issue 84; 15 Sep 2003

RELAX NG is the future for DocBook. But getting a working RELAX NG grammar is only a small part of the battle. We also need to satisfy the requirements of a reasonable evolution path for DocBook. It's going to be a challenge, but a fun one, I think!

This essay is part of my ongoing exploration of what a refactored DocBook might look like. As I've said before, these are my thoughts as I hold them today. They're nobody else's and I reserve the right to change my mind later.

I'm convinced that RELAX NG is the future for DocBook. And building a RELAX NG grammar for DocBook isn't hard. But just having the resulting grammar isn't going to satisfy the requirements as I see them. I want to build a system that will also support the following requirements:

  1. It must be possible to generate DTDs and XML Schemas from the RELAX NG grammar.

  2. I want to take advantage of the additional expressive power of RELAX NG to more accurately reflect the intended semantics of DocBook. For example, there are several elements that have a pair of attributes, like these:

    attribute class { "doi" | "isbn" | ... | "other" }
    attribute otherclass { text }

    The intended semantic is that if “ other ” is chosen for the class attribute, the appropriate other value should be given in the otherclass attribute. With RELAX NG, it's possible to express these co-constraints and I wish to do so. There's an even more interesting case surrounding the use of titles either inside or outside the appropriate info wrapper.

It's pretty obvious that these two goals are in direct conflict with each other. There's no way to express the semantics of the latter example in DTD or W3C XML Schemas. Trang does a nice job of generalizing to make DTDs and XML Schemas out of RELAX NG grammars, but I'm not sure it can practically be expected to unwind my intentions in cases like these.

That means arranging for an automated transformation system that can produce an at least mostly deterministic RELAX NG grammar for some generalization of DocBook that we can hand to Trang.

Another requirement I want to achieve has to do with the ease of use of DocBook as a system, not particularly of the grammar.

  1. I want users to be able to mix-and-match subsets and supersets with relative ease. In the current DTD, there are all sorts of parameter entities that allow one to make subsets or extensions, but that configurability isn't exposed to the casual user.

    Quick! Write a subset of DocBook that doesn't have any trace of msgset or refentrys. Or, quick!, add a new kind of admonition to DocBook called alert so that it has the right content model and can appear everywhere that the existing admonitions are allowed.

RELAX NG already has facilities for grammar extension and grammar redefinition, but I'm looking for something even easier. (Maybe that's a mistake, maybe I shouldn't be going here.)

Frankly, I want something more like the TEI Pizza Chef. Users should be able to say, “I want DocBook with HTML Tables but without CALS Tables or MsgSet and its ilk or callouts.” For some set of predefined modules, they should just be able to push a button and get a subset like that.

Naturally, whatever system we use for this has to be robust enough to allow a more experienced user to configure additional modules this way.

I think it would be possible to take this idea too far, I don't see any value in building a system that allows you to select each individual element. There are just too many variations that won't be useful. Do you really need to be able to trivially select programlistingco without also getting screenco? I don't think so.

Some proto-version of my initial exploration of these ideas is starting to take shape. There's no distribution for it yet (and please don't ask, it's just too early), but you can get it from CVS. It's in the docbook/relaxng directory, if you already have the DocBook repository checked out.

I'll write up some notes about how it works tomorrow.

Fair warning: you'll need Make, Saxon (or your EXSLT-aware XSLT processor of choice), Trang, and (optionally) Perl installed and ready to go.

In a nutshell: author in the compact syntax without a bunch of DocBook idioms that can be machine generated (like the role and common attributes for every element), add annotations to describe complex structures that we know will need to be rendered very differently for deterministic languages, add annotations for easy modularity, convert our modules to the XML syntax, combine the requested modules together, and massage lightly to build the final grammar.