DocBook and HTML 5(.x)

Volume 17, Issue 9; 23 Jul 2014

HTML 5(.x) today reminds me a lot of DocBook 1(.x) twenty years ago. That's neither criticism nor compliment, merely observation.

Some random wind blew the HTML 5(.1) “picture” element across my desk today. That lead me to a page somewhere that enumerated all of the proposals for HTML 5.x elements in their various stages of standardization.

That's drifted back through my consciousness several times today until finally I realized why. The reason is: it reminds me a whole lot of DocBook twenty years ago. Hear me out.

Twenty years ago, DocBook had a relatively small number of tags. Like HTML of today, it had enough markup to do articles and sections and paragraphs and images and block quotations and a short list of other things.

Twenty years ago, DocBook had a selection of specialized elements in addition to the basic structural elements necessary to capture expository prose. HTML has them too; the specializations are different, but that's not surprising.

DocBook was about interchange so there was a fairly diligent effort undertaken to make sure that the processing expectations of each element were clearly defined. The variety of outputs imagined and the fact that the DocBook community had nearly no appreciable influence over the development of the platforms that would support those outputs meant that there was a certain vagueness, but we have always cared. HTML, the specification, goes to great lengths to describe the processing expectations of…everything, not just proper, valid markup but essentially every sequence of characters. HTML is as much about interoperability of browsers as anything else and so there's tremendous effort undertaken to insure that interoperability.

DocBook had a relatively large and diverse community of users (some significant fraction of techpubs plus a smattering of other fields of publication). Ok, HTML's relatively large and diverse community (basically everyone everywhere) eclipses the DocBook community the way the population of beetles on the earth dwarfs, say, the human population of Rhode Island, but we're talking relative scales.

An interesting thing about a large and diverse community of users is that they have different interests and different requirements. And if the community is big enough, you wind up with tags that are of interest to “a large group of people” who are still a relatively small group compared to the whole. DocBook certainly has markup that “almost everyone” never uses, and that I sometimes wish we hadn't invented, because various groups of users, perceived at the time to be of significant size, were able to make a compelling case for it.

HTML, like DocBook, has a committee of developers who respond to requests for new tags, proposals for new tags, proposals for changes to tags, proposals for extensions to tags, proposals for the removal of tags, etc. And like any committee, it attempts to establish guidelines and policies and undertakes to serve its community as best it can.

DocBook today has a quite large set of tags. Large enough that lots of folks want subsets. I don't know if HTML has become that large yet, but I bet it will.

HTML's evolution is never going to more than superficially resemble DocBook's evolution. The HTML community has direct and compelling influence on the platform that supports it (or maybe it's the other way around). DocBook still focuses on encoding technical documents; most of the HTML effort seems to be about developing an open, portable application development framework. Nothing wrong with that except to the extent that it seems to marginalize other goals for the web which, no doubt, one could argue it doesn't.

There's nothing profound in these observations, but I look forward to seeing what HTML is like in twenty years. And DocBook too, of course. I wonder if HTML will have twenty year old legacy markup that almost no one uses or if they'll be able to keep things tidier. The fact that HTML is being developed for effectively a single, global platform (or a platform that appears to be that way from most angles) means there's more opportunity for deprecation, I suppose.