<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
<title>Stylesheet organization</title><biblioid class="uri">http://norman.walsh.name/2008/01/01/docbookStylesheets</biblioid>
<volumenum>11</volumenum>
<issuenum>001</issuenum>
<pubdate>2008-01-01T13:58:21-05:00</pubdate>
<date>$Date$</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2007</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>The XSLT 2.0 stylesheets for DocBook are broken. They have been
for a while, but I think maybe I've figured out how to fix them.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#DocBook"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XSLT2"/>
</info>

<para xml:id="p1">There's a pervasive complexity in the XSLT 1.0
stylesheets. It's not a bad or unnecessary complexity; it's caused, at
least in part, by the fact that there's a lot of flexibility in
DocBook.</para>

<para xml:id="p2">A small example: when a chapter is formatted, the chapter will
have a title and might have a subtitle (among other things); these
elements each individually might or might not be inside an info
element. For some elements, like bibliography, the situation is even
more complicated because the title is optional (if not present, a
locale-specific default title must be used).
</para>

<para xml:id="p3">In designing the XSLT <emphasis>2.0</emphasis> stylesheets for
DocBook, I wanted to factor out some of this complexity. The result
is a multi-phase transformation:</para>

<orderedlist>
<listitem>
<para xml:id="p4">The first phase adds an <tag class="attribute">xml:base</tag>
attribute to the root element so that URI resolution in subsequent
phases will know the correct base URI.</para>
<para xml:id="p5">It resolves <tag class="attribute">entityref</tag> attributes
into <tag class="attribute">fileref</tag> attributes because
subsequent phases won't have access to the entity declarations in the
original document.</para>
<para xml:id="p6">If the root element does not declare the DocBook
namespace, it moves all elements in no namespace into the DocBook
namespace. This allows the XSLT 2.0 stylesheets to format DocBook V4.x
documents; in fact, not only are the elements moved into the DocBook
namespace, but they're transformed a bit too, so that they (are more
likely to) conform to DocBook V5.x. This is a convenience, the XSLT
2.0 stylesheets aren't guaranteed to format DocBook V4.x
correctly.</para>
</listitem>
<listitem>
<para xml:id="p7">The second phase applies profiling.</para>
</listitem>
<listitem>
<para xml:id="p8">The third phase normalizes markup. This is the phase that reduces
the complexity described above: <tag>info</tag> elements are added uniformly
(so all elements that can be inside or outside of an info are always
inside), defaulted titles are made explicit, and a number of other changes
are made to make the markup more regular.</para>
</listitem>
</orderedlist>

<para xml:id="p9">The problem with this approach is that it interferes with users'
expectations. Not stylesheet users per se, but stylesheet customizers.
If a customizer wants to change the root template (to tinker with the
HTML page metadata, for example), he or she is naturally going to add
a new root template to their customization layer:</para>

<programlisting>&lt;xsl:template match="/"&gt;
…
&lt;/xsl:template&gt;</programlisting>

<para xml:id="p10">Trouble is, in this multi-phase approach, that just
completely breaks everything. The root template expects to run the
phases, and if you steal that, it just goes pear shaped.</para>

<para xml:id="p11">What you have to do instead is add this template to your customization
layer:</para>

<programlisting>&lt;xsl:template match="*" mode="m:root"&gt;
…
&lt;/xsl:template&gt;</programlisting>

<para xml:id="p12">Now, I suppose, in the grand scheme of things that's not
<emphasis>so</emphasis> bad, but it's <emphasis>ugly</emphasis>.</para>

<para xml:id="p13">My first approach to fixing this problem was to break the
DocBook stylesheet into explicit, discrete phases. Unfortunately, this
requires the user to actually setup a pipeline and run a series of
separate transformations. In a
post-<link xlink:href="http://www.w3.org/TR/xproc/">XProc</link>
world, this might actually be ok, but today, it's setting the bar
<emphasis>awfully</emphasis> high for your average user.</para>

<para xml:id="p14">So I have a new approach. Instead of using the root
template to run the phases, I use a named template. One of the features
of XSLT 2.0 is that you can start a transformation with a named
template.</para>

<para xml:id="p15">Using this approach, you must tell the processor to start with
the right template (using <option>-it:format-docbook</option> in the
<wikipedia page="Saxon_XSLT">Saxon</wikipedia> command-line case), but
otherwise, you're free to customize the “/” template to your hearts
content.</para>

<para xml:id="p16">But you're hosed if you forget to start at the right named
template. To detect this error, the <literal>format-docbook</literal>
template injects a harmless processing instruction before the document
element so that subsequent processing can determine if the user forgot
to use the named template.</para>

<para xml:id="p17">In the short and medium term, I think this is the right approach.
In the long term, XProc will rule the world and we can simplify things
further with a series of explicit transformations.</para>

<para xml:id="p18">I've got this implemented in the “<literal>xsl2-namedt</literal>” branch
of <link xlink:href="https://sourceforge.net/svn/?group_id=21935">the repository</link>.</para>

<para xml:id="p19">If no one tells me why this approach is either stupid or insane,
or both, I'll probably move it to the trunk in a couple of weeks.</para>

</essay>

