Specialization and Extensibility

Volume 9, Issue 12; 27 Jan 2006; last modified 08 Oct 2010

I don't think DITA's notion of specialization is really very exciting and it sure doesn't make DITA more extensible than DocBook.

As I've said, to the extent that there is a “which is better, DocBook or DITA?” debate, I'm trying to stay out of it. I'm not an impartial observer and I don't relish the role of “defender of DocBook”. Nor do I find any prospect of pleasure in “attacking DITA”. So mostly, I ignore the issue.

Mostly.

Edd Dumbill’s piece, Lovely DITA, DocBook fades? has been floating on a tab in my browser all week. It's a fair and balanced essay, generally pretty favorable towards DocBook, title notwithstanding, so I've got nothing to complain about…except for the part where he says

DocBook is a fixed element and attribute set, DITA is extensible, allowing the definition of custom information types

That's just flatly wrong. The part about DocBook, anyway. DocBook has always been one of the most extensible schemas around. We didn't put thousands of parameter entities in the DTD version just so we could watch early SGML parsers crash. Nor have we carefully and deliberately established extensible patterns throughout the RELAX NG version solely for our own amusement.

What Edd is appealing to here is DITA's notion of “specialization”. The marketing folks behind DITA products and consultancies have latched onto specialization as if it were some sort of silver bullet to slay the document interchange monster. A little web searching for “DITA specialization” will turn up plenty of hype.

I'm not saying it isn't useful, but let's consider what it really means. The idea behind specialization is that when I invent a new element, I declare what existing element it “specializes”. In theory, this declaration allows a tool processing my document to fall back to some default processing if it doesn't understand my specialization. The thing to remember is, the extent to which this fall back processing is useful or correct depends largely on the importance of the semantics of your new element.

Let's consider a case where it works reasonably well. Suppose I make lots of references to Wikipedia in my writing. Everytime I make a Wikipedia reference, I type markup like this:

<link xlink:href="http://www.wikipedia.org/wiki/Thing">Thing</link></programlisting>

After a while, I get tired of this and decide that I'm going to craft an extension to simplify my life. Instead of writing it all out, I create a wikipedia element that specializes Whether this particular specialization is possible in DITA, I don't know. But the principle stands irrespectively. link. Now I can just write:

<wikipedia>Thing</wikipedia>

I'll customize my system to handle the wikpedia tag, but if I send the document to you without my customizations, your system will fall back to link processing. That means you won't get exactly the right output, but it'll be pretty close. You'll get the right text, but not the link.

On the other end of the spectrum, suppose you want to add something with quite specific semantics. EBNF diagrams, for example. Here you're going to have to have elements like production, left-hand-side, and non-terminal. There's unlikely to be anything in the base schema with semantics that are even close to that, so you'll probably end up having to specialize paragraphs and phrases. If you take a quick look at what a typical EBNF diagram is supposed to look like, you'll have no difficulty imagining how completely unusable the fall back presentation of that markup is going to be.

I've already shown how the notion of specialization could easily be added to DocBook, and there might be some value in doing so, but it would be a foolish mistake to believe that such a feature was going to fix all your interchange problems. Interchange is a complex issue that needs to be approached thoughtfully.

And there's nothing about it that makes DITA more extensible than DocBook.

It's all mostly irrelevant anyway. If experience with DocBook is indicative, and I think it is, very few users are every going to make any customizations to the markup at all. Sure, some big companies will hire consultants to craft customizations for them, but they're in the minority. Most users just grab the schema and start using it.

Comments

While I agree with your main argument here, you then go on to say that "very few users are every going to make any customizations to the markup at all".

Is that so? I use DocBook as the source format of my web pages, and I've done some customizations. It was straight-forward to do, so I'd imagine companies doing it as well. On the other hand, maybe for them it is more important to stick to the standard so that there is no surprises later?