XML Interop, DocBook, and Ease of Use

Volume 7, Issue 25; 19 Feb 2004; last modified 08 Oct 2010

Is XML really all it’s hyped up to be?

Perhaps the most valuable result of all education is the ability to make yourself do the thing you have to do, when it ought to be done, whether you like it or not.

Thomas H. Huxley

I recently created a PubSub subscription to “DocBook” content. That means I get a daily feed of all the blog entries that PubSub knows about that contain the word “DocBook”.

<aside>Here’s a really good use for atom:id. I don’t need to see every every syndication of a particular essay in every different feed it appears in. I only care about seeing it once.</aside>

My DocBook subscription lead me to “They Lied to Us” by Dan Moniz. It’s a brief IRC snippet bemoaning the lack of XML interop. When I asked Dan what he meant, he pointed me to a longer essay on the topic.

By and large, everything he says there is true. In fact, the only really great XML editor I've ever seen is Arbortext’s EpicThe standard disclaimer applies here: I used to work for Arbortext. I contributed to the first couple of Epic releases. I’m not claiming to be unbiased, but I am trying.. It’s great but it’s horribly expensive. (And I think there are good economic reasons why it’s likely to remain that way.)

(My point in this essay is not to start some sort of war over who has the best editor. There are some inexpensive, even free, XML editors out there that are pretty good. If they were as good as Epic, the following points would apply to them too.)

The real challenge in writing a good XML editor is two-fold:

  1. If you try to be general and write “an XML editor,” you are facing a serious engineering challenge. And at the end of the day, you better provide a pretty sophisticated scripting language with pretty much complete access to the editor’s data structures if you hope to be able to use your “XML editor” to edit “HTML documents” or “DocBook documents” or “TEI documents” or any specific vocabulary.

    If you try to write “a DocBook editor,” you’ll have a slightly easier time because you’ve eliminated a lot of variables. You’ve also eliminated all your markets except one.

In fact, the reason Epic is so damned good is that it’s really an excellent XML editor, Adept, with a lot of customization, mostly using the native scripting facilities, but with help in the application sources where necessary, to make it a DocBook editor, Epic. (Or it was that way when I left Arbortext. It’s been a few years and things may be different now. I don’t claim to know.)

  1. Writing structured documentation is hard. It’s a different way of thinking about content. That’s not a technical problem, it’s a social/authoring problem. I have very little advice about how to solve that. If you have a stick (do it the way we say or your fired), maybe you can do it. If you have a carrot (do it the way we say and you get a huge reward), maybe you can do it. If you manage to get a bunch of markup geeks who think structured authoring is natural and logical and wouldn’t dream of doing it any other way, you can probably do it.

    But if you start with some regular Word users and try to get them to do structured editing, you often encounter resistance.

Dan points out, fairly I think, that there’s been a lot of hype about the benefits of XML and how easy it’s going to make things. If you believe all that hype, the first few weeks in the trenches must be a real shock.

I’m trying to decide if I’ve contributed to that hype. I’m not sure. XML authoring, and DocBook authoring in particular, are easy by some metrics. For example, I do it in a free editor. By that metric, Word is extremely hard for me. I’d have to change operating systems ($$$) and buy the application ($$$). (Yes, I could use OpenOffice, in fact, I do sometimes, but that’s not really the point.)

The point is it depends on your metrics. It is dead easy for me to publish an essay like this one in HTML and PDF, and to syndicate it in RSS and Atom, and to generate metadata that can be queried. I could easily generate other forms as well.

Using XML also provides long term flexibility. Sometimes that’s a good metric. Here’s a simple example: JAXP 1.3 is being authored in DocBook. JAXP 1.2 was authored in Frame or something. When I posted a pointer to the public draft of JAXP 1.3, John Cowan pointed out that without some sort of diff, it was going to be just about impossible for him to review it. Fair enough. And for JAXP 1.4, I’ll be able to generate that, because JAXP 1.3 (and 1.4, if I’m working on it) will be in XML. Maybe there’s a Frame tool that could have done the job, but I don’t have the right OS or the application.

At the end of the day, though, I’m just one of those markup geeks that does this because it’s obviously the right way to do it. Obvious to me, anyway. I liked WordStar “dot commands” for crying out loud. I abandoned word processors the minute I found Script. I was so determined to stay away from word processors that when running script on the mainframe became irksome, I even tried to write my own Script processor for the PC. I abandoned Script when I found TeX and TeX when I found SGML and SGML when XML came along.

I never wanted a word processor, I always wanted a document engineering language. XML is the best one around. Will I abandon XML when something better comes along? I Dunno. Emotionally, I feel like I have a lot invested in XML. But history suggests I’ll jump ship when the time comes.

In the meantime, yeah, for just about everyone, it’s harder to do XML than not. For most people, the extra effort may never be worth it. For some people, it will be worth it sometimes. For a few people, it will be worth it most of the time. For me, well, it really is easier than the alternatives, but I wouldn’t pretend that was true for very many people.

I think that structured authoring, and particularly DocBook, are absolutely, always worth it for large documentation projects (like documenting major open source systems) where you need a non-proprietary solution that works on any platform and where longevity is an important factor. XML is always worth it for large, long-term, distributed projects. DocBook is the right tool for computer software and hardware documentation. For other kinds of projects, other vocabularies, other XML vocabularies, might be better.

But I would say that, wouldn’t I?

Comments

There _are_ free structured authoring tools, but they really suck. If anybody would be able to combine strength of Amaya (and extend it to at least Docbook) with ease of use of LyX, I would be immediately game.

—Posted by Matej Cepl on 21 Feb 2004 @ 03:44 UTC #

And still no one addresses this difficulty of enabling Word users to edit in XML. It comes up so often and its a real market addressed in isolation so many times for solid business reasons. I wonder why not?

—Posted by Dave Pawson on 22 Feb 2004 @ 09:22 UTC #

Basically the benefit of Word is that a user knows a simple subset of the interface(since hardly any users really know a lot of Word) and can use that. Any Word extended to allow editing of xml will have side effects in user behavior that abrogate this benefit, either the side effect will be that the word interface will have so many extra buttons and steps that one has to take to use them that it is no longer word, or two that the word document itself must be structured, for example if I want a tree of -subjects -subject -description 'xml editors' -related 'markup'-/related -/description -/subject -/subjects I might say that the user has to use subjectlist.dot to make their document, use the subjects style for the list, and that every item in that list can have a number of paragraphs, the style names being related to the output names. If you have a dialect that requires -image @src='some.gif' -subtext a picture of our xml editor -/subtext -/image then you need to be able to process words flat structure and get a style image-subtext directly following an image in order to build your hierarchy. This is an already solved problem, I know as I worked on the/a solution, however it requires the user will use Word in a structured manner, which most users are not in the habit of doing.

From tests etc. I've done the same solution can be done for Open Office Writer, and I suppose, Star Office Writer, etc.

—Posted by bryan bry@itnisk.com on 24 Feb 2004 @ 04:23 UTC #

The xxe application (XMLmind XML Editor: http://www.xmlmind.com/xmleditor/) is actually quite nice. Is free (for the standard edition), and very powerful. It has good DocBook support integrated in by default and is flexible enough to handle extensions (the XML language, as well as to the app). I'm in no way affiliated with XMLMind, other than being a happy user of their product.

—Posted by William McVey on 30 Mar 2004 @ 12:06 UTC #