<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="pto" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
    
    
    
<title>Why Refactor DocBook?</title><biblioid class="uri">http://norman.walsh.name/2003/06/16/whyrefactor</biblioid>
<volumenum>6</volumenum>
<issuenum>38</issuenum>
<pubdate>2003-06-16</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2003</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>More thoughts on refactoring.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#DocBook"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<para xml:id="p1"><personname>
      <firstname>Michael</firstname>
      <surname>Smith</surname>
    </personname>
<link xlink:href="http://lists.oasis-open.org/archives/docbook/200306/msg00006.html">asked me</link>
to clarify the motivations I have for wanting to refactor DocBook.
I <link xlink:href="http://lists.oasis-open.org/archives/docbook/200306/msg00050.html">did so</link> on the list, but I'm putting my ideas online
here as well for consistency.</para>

<para xml:id="p2">These are some further thoughts on why I think now is the time to
refactor DocBook. Apologies, in advance, if some of these issues have
already been discussed on
<link xlink:href="http://lists.oasis-open.org/archives/docbook/">the list</link>
recently. I haven't caught up yet. I wrote these thoughts while I was
disconnected on the plane ride home.</para>

<orderedlist>
<listitem>
<para xml:id="p3">The single most compelling reason, the reason that I think would be
sufficient if it was the only reason, is that DocBook has become
brittle. It has grown, slowly and reasonably conservatively but
continuously, for many years. Changes that were each individually
small and well conceived form quite a tenuous pile when taken all
together. Look at the number of class and mixture parameter
entities we now have. Many are very similar but not the same. Can
you tell from inspection why they aren't the same? Is the
organizing principle that created them discernable? I don't think
so. As the current maintainer, I'm aware that this is my fault to
one degree or another.
</para>

<para xml:id="p4">Whatever the cause, and irrespective of whether or not it was
avoidable, we've reached the point where my software engineering
experience suggests that attempts to continue on a path of
accumulating patches is not practical.
</para>
</listitem>

<listitem>
<para xml:id="p5">DocBook was conceived, designed, and built within the limiting
framework of SGML and then XML DTDs. In some ways it stands as a
testament to just how much you could do with those technologies.
But they are hardly modern.</para>

<para xml:id="p6">For a project as large and important (if one measures importance in
terms of number of users or amount of legacy, at least) as DocBook,
I think novelty for novelty's sake would be a very bad idea indeed.
In fact, if all things were equal, I don't think it would be
inappropriate for DocBook to lag behind the technology curve. It
needs to be stable and reliable.
</para>

<para xml:id="p7">But all things are not equal. I think we've passed a complexity
threshold beyond which the parameter entity mechanisms available in
DTDs are simply not up to the task of supporting further
development. I am not, and have never intended to, suggest that
DocBook shouldn't be available as a DTD for many years to come, I
just don't think that the DTD should be the <quote>source format</quote>, the
format upon which further development and customization is based.
</para>
</listitem>

<listitem>
<para xml:id="p8">Engineering advances do not proceed smoothly and uniformly over
time. Instead, they proceed in fits and starts, with watershed
events spuring periods of rapid development. I think RELAX NG is a
watershed event in markup languages.</para>

<para xml:id="p9">DocBook hasn't suddenly become unmanageable because we added one more
tag. The development of DocBook has been straining the bounds of
DTD development for some time. I have been thinking about how to
make progress, about how to perform a refactoring (although I'm not
sure I was consciously aware that that was what I was considering)
for several years. The famous <quote>PE reorganization</quote> RFE has existed
for at least five years. I've considered, and even prototyped,
several possible approaches.
</para>

<para xml:id="p10">RELAX NG changes the validation
model just a little bit. It removes some restrictions and allows us
to think about validation in a different way. Suddenly I see a
clear path forward, a way to build a much simpler, more coherent,
more easily customizable DocBook framework.
</para>

<para xml:id="p11">Now, at the moment, I have only a vision, and a few sketchy
prototypes. I don't have enough running code to be certain my ideas
will work. But I feel pretty confident.
</para>
</listitem>

<listitem>
<para xml:id="p12">Tools exist (thank you again, James) that will allow us to continue
to support existing tools and applications even as we move forward.
If moving to RELAX NG required us to turn our back on every
DTD-based XML tool that processes DocBook, the very idea of doing
it would be very much D.O.A.<footnote>
	  <para xml:id="p13">
	    <quote>Dead On Arrival.</quote>
</para>
	</footnote>
</para>

<para xml:id="p14">My vision for the intermediate future is one where DocBook is
maintained in RELAX NG and where customization layers (both
extensions and subsets) are devised at the RELAX NG level. But DTDs
are still provided by translating the RELAX NG grammars with
<application>Trang</application>.
</para>

<para xml:id="p15">It is likely to be the case that the DTDs will not validate
precisely the same documents as the RELAX NG grammar. The extent to
which there is variation will depend on part upon how we design
DocBook, but I don't think perfect fidelity should be a goal.
</para>

<para xml:id="p16">If perfect fidelity isn't possible, why bother? Because even a
slightly less constrained schema can still be used to drive editing
tools like Emacs and Epic. And it will allow all the existing
DTD-based tools to continue to offer some level of validation.
(They'll be able to find simple typos, for example, even if they
can't enforce every constraint.)
</para>
</listitem>

<listitem>
<para xml:id="p17">DocBook needs to be able to adapt to a changing world. I've already
found several occasions, for example, in which it would have been
convenient for DocBook to have been in a namespace. I can imagine
scenarious where it would be almost necessary. No matter what you
think about namespaces, I think they're here to stay. I don't see
any long term viability to an attitude of refusing to use them, at
least judiciously.
</para>
</listitem>

<listitem>
<para xml:id="p18">I think similar arguments can be made for the judicious use of
simple data types, although I'm by no means certain of that. I can
imagine, for example, that there might be value in validating that
the content of the <tag>date</tag> element is, in fact, a date. And
even more potential value in being able to sort dates and other
simple values <quote>correctly</quote>.
</para>
</listitem>

<listitem>
<para xml:id="p19">I think DocBook is a world leader in its class. I think there's an
opportunity here to continue that leadership role and I think we
should take that opportunity. We should reinvent DocBook for the
modern markup world.
</para>

<para xml:id="p20">I don't think anything I'm suggesting is radical. I don't propose
that we invent something that's going to be maliciously (or
capriciously) incompatible with the current needs or even the
current markup of existing users.
</para>

<para xml:id="p21">It's just time to refactor. I think that's a natural part of the
life cycle of an software system that's in the middle of its
productive lifespan.
</para>
</listitem>
</orderedlist>
</essay>

