The DocBook Encoding Initiative or “TextBook”?

Volume 6, Issue 95; 09 Oct 2003

In a lot of ways, DocBook and the TEI are very similar. I spent most of today looking over the TEI Meta language and the constructs in DocBook and the TEI. Maybe it’s possible to design our schemas so that they can easily interoperate. In any event, a few touristy snaps of Oxford as well.

An ill-humoured man is a prisoner at the mercy of an enemy from whom he can never escape.

Sa'di

I think there’s always been some good-humored competition between DocBook and the TEI. Or maybe it’s just between Sebastian and myself; his invitation to join the TEI Meta working group came in a message with the subject “Working with the enemy?” It was an invitation that I accepted gladly.

In a lot of ways, DocBook and the TEI are very similar: they are both large, rich schemas designed for marking up textual documents in ways that convey semantic information relevant to a human reader. And while each has particular strengths, to a certain extent, both can stretch to fill the others shoes: many of the Oxford University Computing Services web pages are created from a TEI extension and I’ve seen works on Islamic architecture authored in DocBook.

The working groups responsible for DocBook and the TEI are both considering how to migrate to their next major versions (coincidentally, both of which will be version 5). The groups are also, less coincidentally I think, both actively exploring RELAX NG as the natural schema language in which to express their designs.

Since I was already on this side of the pond for other business and a few days vacation, I jumped at the chance to come down to Oxford for a day and meet face-to-face.

Church at Dusk
Church at Dusk
Evening Church
Evening Church

So Sebastian Rahtz , Lou Burnard , and I spent most of today looking over the TEI Meta language and the markup constructs in DocBook and the TEI. As we pored over the schemas, we talked about the best way to express one constraint or another. The TEI is built from a true literate programming system, so there were some interesting issues to discuss.

The Royal Oak
The Royal Oak

One topic that we came back to many times was, would it be possible to design our schemas so that they could easily import modules from each other? For example, could DocBook be structured so that TEI could easily import the GUI inlines if someone wanted to write a book about computer software in TEI? Or could TEI be structured so that DocBook could easily import markup for dictionary entries if someone wanted to write a dictionary about computer terms.

In a DTD world, I think this would have been practically impossible. But RELAX NG deals a lot more intelligently with extension; constructs like interleave can be used to extend patterns without completely redefining them. (In DTDs, parameter entities can be defined exactly once so you really have to jump through hoops.)

With a little coordination, I think we’ll be able to define “link points,” where DocBook and the TEI share common pattern names. I believe that will make it possible to insert modules across schema boundaries at those points.

Who’s Namespace is This?

For inline elements and elements with very specific content models, it seems reasonable to use namespaces. So a DocBook guimenuitem might be a db:guimenuitem in the TEI (assuming we put DocBook in a namespace).

Now consider the following scenario. Suppose it’s possible to configure DocBook and the TEI so that DocBook can import the TEI module that supports markup for plays. It’s a stretch, but maybe you’re writing a book that uses dialog between two programmers to make your points.

At some point inside the dialog markup, you’re going to get to “paragraphs” and you’re going to want DocBook para’s, not TEI ps. If DocBook and the TEI coordinate well, customizations like this will work just fine. The module for plays will interleave itself into the right patterns so that it can be used in DocBook and it will refer to some pattern for “paragraph content” that will be appopriately defined in each schema.

What namespace should the play elements be in? Are they TEI elements with DocBook content, or are they DocBook elements isomorphic to the TEI elements? Should DocBook and TEI be in the same namespace?

(Maybe you disagree and think that after you switch to TEI, it’s TEI all the way to the leaves. That’s a defensible position, but I don’t think it’s as interesting or useful.)

Comments

The link to TEI at the top goes to "Tax Executives Institute". I assume you meant http://www.tei-c.org/

—Posted by James Henstridge on 27 Oct 2003 @ 05:18 UTC #

Indeed. A natural hazard of off-line composition, I suppose.

—Posted by Norman Walsh on 27 Oct 2003 @ 05:31 UTC #

I think you spelled Sebastian's name wrong. Yes, write a book on TEI / Docbook interleaving...(grin) There are hard decisions to know what markup to use...I think it all boils down to tools... but what do I know. I'm using LaTeX currently due to the toolset and the finer granularity in this.

Les

—Posted by Les Richardson on 30 Nov 2003 @ 04:39 UTC #

Ack. Thanks, Les, and sorry, Sebastian. Fixed.

—Posted by Norman Walsh on 04 Dec 2003 @ 02:35 UTC #

You spelled my name wrong too. You might also like to fix the URL for the OUCS website: it should be http://www.oucs.ox.ac.uk Nice pictures though.

Happy new year!

Lou

—Posted by Lou Burnard on 31 Dec 2003 @ 05:44 UTC #

Errr. No. 1. Different purposes? 2. V.little overlap[docbook vs tei] 3. SR steals from charities.

—Posted by Dave Pawson on 31 Dec 2003 @ 06:25 UTC #