Reconsidering specialization, part the first

Volume 13, Issue 27; 30 Aug 2010; last modified 08 Oct 2010

It's been a few years since I first considered DITA specialization. I wonder if I missed the point? I think that might depend on the assumptions that I brought to the table.

One of the most interesting hallway conversations that I had at Balisage was with Eliot Kimber. He and I spent about two hours exploring the differences and similarities between DocBook and DITA.

Here's my synthesis of Eliot's position:

The only difference between DITA and DocBook is specialization, and specialization is why DITA is better.

I'll accept the first part of that observation without argument. To ordinary mortals, DITA and DocBook might look very different, but Eliot is as skilled a markup wrangler as you're likely to encounter. Assuming you worked out a mapping for whatever semantic ambiguity there might be in your corpus, given time and inclination, I'll grant that Eliot could convert anything into anything else. So in that broad sense, they're the same.

That just leaves specialization. I've historically not been impressed by specialization. But I've been wrong before.

In talking to Eliot and thinking about it afterwards, I've come to realize that I'm burdened by a particular set of assumptions, formed long ago, that may no longer usefully guide me through the real world.

One of the critical, motivating goals for adopting an XML (or SGML) publishing system in techpubs was the ability, when all was said and done, to demonstrate a lights-out, high quality, print publication system. You poured valid documents in at the top and aesthetically pleasing, professionally typeset pages that adhered strictly to the organization's design and style guidelines came out the other end.

The way, the only way, that this was achieved was to start with quality tools (editors, formatters, typesetters) and customize each of them, perhaps significantly, until all the stake holders signed off that the results were up to the required standards of information content, layout, design, and typography.

One consequence of this approach was that every techpubs organization had their own markup vocabulary. Even if they had all started from some common standard (which, mostly, they hadn't), the varying constraints of design, tool chain, and customizer skill invariably lead to diversity.

One of the intellectual progenitors of DocBook was a desire to address this problem, specifically in the area of Unix reference page (“man page”) documentation. Back in the early 90's, there were several commercial Unix vendorsNo, really! and they, among others, got together to work on DocBook as an exchange DTD. The idea was that if Vendor A wanted to share the cat reference page with Vendor B, Vendor A would translate their custom markup into DocBook and send that to Vendor B where Vendor B would translate DocBook into their custom markup.

That's why DocBook element names are so long, sometimes almost absurdly so. They were named for maximum clarity of meaning, not authoring convenience.

That was then. This is now?

Lights-out publishing is still a requirement for some organizations, especially in core techpubs, but I think there's also evidence that tools like Adobe InDesign have relaxed this requirement for many more organizations. If you're going to pour the markup I send you through a visual tool and make even a light manual pass over the document, you can afford to be a lot more forgiving about the markup I send you. And a system that is “a little bit forgiving” is vastly, spectacularly easier to implement than one that is “not forgiving at all”.
If the web has taught us anything, it's that quality hardly matters at all. Sturgeon's Law applies. We now routinely accept layouts that no self-respecting publisher would have had the temerity to propose years ago.
And finally, this printing ink on dead trees at a thousand-plus DPI, who does that? It's a rare piece of software or hardware that comes with more than a pamphlet these days. Environmentally, that's probably a good thing, but it has done nothing to improve the quality (see previous point) of what's produced.

How is this related to specialization? It's related to specialization because specialization is about interchange. DocBook has always been about interchange: precise, carefully managed interchange.

Specialization is about blind interchange: I send you my documents, documents that contain markup you've never seen before, you run them through your normal toolchain, and the results are “good enough”.

If you're carrying around the assumptions I outline above, blind interchange is a manifestly absurd notion. But if you relax your assumptions to perhaps more accuratly reflect the twenty-first century, then maybe blind interchange becomes not just possible, but practical.

And maybe, just maybe, that makes specialization intersting.

Don't pay any attention to those creaking, scraping noises that you hear. That's just me rearranging my mental furniture. More to follow.

Comments

Hmm. I still don't believe that DITA-style specialization is worth the overhead it adds. Also, what can you do with DITA-style specialization that you can't do with an attribute (e.g. class="checklist") and help from your authoring tool (i.e. the presents "checklist" as if it were an element you can insert even though it really inserts )?

Yes, the overhead concerns me too. That's one of the essays to come.

The overhead that specialization adds translates into the slightly more overhead that a class adds to a document. True, there's the definition of the specialized object (whether it be a domain, element, or attribute) and the required processing to render the specialized object.

However, keeping in mind blind interchange, those costs are not born by the receiver of the content because the default processing for the more generalized object comes into play. The bottom line is that the overhead is paid by the organization that requires the more specific content model and no other organization (unless they require the same specificity).

It seems to me that DITA specialization is about more than just interchange, since it provides a somewhat more formal structure for customization. Where DocBook makes it easy to add new elements to the syntax, DITA goes one further by allowing that element to declare the basis of its semantics, as well. Your DITA-aware toolchain (editor, publishing system, possibly your CMS) already knows the basic behaviors to apply to the element, from which you can then customize.

I think this structure also does more for interchange with DITA than you give it credit for. If I customize DocBook, I might not be able to send my documents to anyone else to work with without my customizations, since they won't be "standard DocBook" anymore, unless my customizations are strictly limited to just reduced content models. If I specialize DITA, I can always very easily produce "standard DITA", at the worst, or send just my DTD specialization module, and any other DITA-aware system will be able to do basic processing.

So, you say, this is just the "blind" interchange scenario. However, DITA specialization encourages creation and sharing of different types of specializations, such as industry-specific tag sets, which can be further specialized by individual companies. So, there's a good chance that the other party with whom I want to share my content will be able to process my content at a level higher than just standard DITA, since they're likely to be using, or at least have access to, that same industry-specific specialization.

This is certainly more than "blind" interchange, and while it may not be "20/20", it is still a step in the right direction. In short, DITA provides an architecture that encourages carefully-planned, downward-compatible customization. The architecture also provides excellent support for multiple layers of customization, each building on the last, so that different groups can interoperate at the highest layer shared by both parties.

I don't think you were wrong the first time around. At the end of the specialization piece you wrote:

"If experience with DocBook is indicative, and I think it is, very few users are every going to make any customizations to the markup at all."

That has been my experience as well. Customizations are few and far between, pretty much regardless of the DTD or schema used. Of course, individual users might do them but organizations for the most part do not. As for specializations in DITA, I've regarded them as a fallback mechanism anyway, not something that's done regularly.

Then again, I could be wrong, too.

I question the statement "The only difference between DITA and DocBook is specialization, and specialization is why DITA is better." What about conref and keyref? Yes DocBook has XInclude, but conref validates and keyref enables indirect addressing. How can I accomplish those things in DocBook?