<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       version="pto">
<info>
<title>More Ruminations on DocBook</title>
<volumenum>6</volumenum>
<issuenum>24</issuenum>
<pubdate>2003-05-29</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2003</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Some ideas about what a refactored DocBook might look like, and a prototype.
</para>
</abstract>
</info>
<epigraph>
<attribution>Elias Canetti</attribution>
<para xml:id='p1'><indexterm><primary>Canetti</primary><secondary>Elias</secondary></indexterm>He
would like to start from scratch. Where is scratch?</para>
</epigraph>

<para xml:id='p2'>It doesn't seem quite fair to <link xlink:href="../21/docbook">suggest
scrapping DocBook</link> without at least considering what should
replace it. And rather than waiting until I'm finished, I think it
probably makes sense to publish what I've cooked up. It's maybe
three-quarters finished, maybe a little more. In any event, it's just
one guy's idea.</para>

<para xml:id='p3'>In general terms, my changes fall into four categories: rationalize
the content model of inlines, normalize the metadata, discard cruft, and
make changes that appear (to me) to simplify things.</para>

<section xml:id='s1'>
<title>Rationalizing Inlines</title>

<para xml:id='p4'>I've divided inlines into three classes: ubiquitous inlines
(ones that should be available <emphasis>everywhere</emphasis>),
general inlines, and domain-specific inlines.</para>

<para xml:id='p5'>In trying to find a design principle to discriminate between what
should go in the content model of a particular inline and what should not,
I eventually settled on a simple one: any given inline contains just text
or it contains every inline. In my prototype, a lot of inlines contain just text.</para>

<section xml:id='s2'>
<title>Just Text</title>

<para xml:id='p6'>Given that there are some ubiquitous elements, what does <quote>just
text</quote> mean? It means the following:</para>

<programlisting>ubiq.inlines = db.inlinemediaobject
             | db.anchor
             | db.indexterm
             | db.remark
docbook.text = text | ubiq.inlines
             | text.phrase | text.replaceable</programlisting>

<para xml:id='p7'>Anywhere that character data is allowed, so is
<tag>inlinemediaobject</tag> (because it's the traditional
DocBook way of allowing special characters; less necessary in XML but still
valuable enough in legacy terms to justify inclusion), <tag>anchor</tag>,
<tag>indexterm</tag>, <tag>remark</tag>, and special forms
of <tag>phrase</tag> and <tag>replaceable</tag>.</para>

<para xml:id='p8'>What's special about <tag>phrase</tag> and
<tag>replaceable</tag> in this context is that they contain
<quote>just text</quote>. In contexts where all inlines are allowed,
they're allowed inside <tag>phrase</tag> and
<tag>replaceable</tag> too.</para>
</section>
</section>

<section xml:id='s3'>
<title>Normalizing Metadata</title>

<para xml:id='p9'>DocBook has a dozen or more flavors of metadata wrapper
(<tag>bookinfo</tag>, <tag>chapterinfo</tag>,
<tag>sidebareinfo</tag>, etc.). It has all these flavors because
DTDs only allow one content model
per element name and we wanted to provide some way for customizers to
require or restrict metadata in different contexts.</para>

<para xml:id='p10'>RELAX NG removes the restriction that there can only be one
content model per element name and allows us to replace all of these
multifarious elements with a single wrapper:
<tag>info</tag>.</para>

<para xml:id='p11'>Out-of-the-box, <tag>info</tag> comes in three flavors: with a required
title, with an optional title, and with titles forbidden. The grammar is arranged so
that customizers who need or want to add more flavors can easily do so, without adding
more element names.</para>

<para xml:id='p12'>I've also taken the liberty of enforcing two additional rules:
<tag>title</tag>,
<tag>titleabbrev</tag>, and
<tag>subtitle</tag> must appear first (and in that order) if they're
allowed or required, and they may appear only once.</para>

<para xml:id='p13'>And titles are <emphasis>only</emphasis> allowed inside <tag>info</tag>.
You can't have them outside anymore.
</para>

</section>

<section xml:id='s4'>
<title>Discarding Cruft</title>

<para xml:id='p14'>Some stuff just has to go. I have no doubt that for every element in DocBook,
there's a user somewhere. But I believe experience suggests that some of them
are not worth the complexity they carry.</para>

<para xml:id='p15'>My list of candidates for deletion:</para>

<itemizedlist>
<listitem><para xml:id='p16'><tag>msgset</tag>. And perhaps more controversially
<tag>simplemsgset</tag>.
</para>
</listitem>
<listitem><para xml:id='p17'><tag>graphic</tag>, <tag>inlinegraphic</tag>,
<tag>graphicco</tag>.
</para>
</listitem>
<listitem><para xml:id='p18'><tag>sgmltag</tag>. Replaced by <tag>xmltag</tag>.
</para>
</listitem>
<listitem><para xml:id='p19'><tag>authorblurb</tag>. Replaced by
<tag>personblurb</tag>.
</para>
</listitem>
<listitem><para xml:id='p20'><tag>toc</tag> and <tag>lot</tag>. Replaced
by much simpler <tag>toc</tag> markup.
</para>
</listitem>
<listitem><para xml:id='p21'><tag>caption</tag>. Maybe we should allow captions on
figures, but allowing them on <tag>mediaobject</tag> is clunky.
</para>
</listitem>
<listitem><para xml:id='p22'><tag>modespec</tag>, <tag>invpartnumber</tag>,
<tag>pubsnumber</tag>, <tag>isbn</tag>, and 
<tag>issn</tag> (use <tag>biblioid</tag>), <tag>structname</tag>,
<tag>structfield</tag>, <tag>medialabel</tag>,
<tag>interface</tag>, <tag>action</tag>, <tag>property</tag>,
<tag>otheraddr</tag>, <tag>contractnum</tag>,
<tag>contractsponsor</tag>, <tag>corpauthor</tag>
(<tag>author</tag> now allows either a
<tag>personname</tag> or an <tag>orgname</tag>),
<tag>corpname</tag> (replaced by <tag>orgname</tag>),
<tag>beginpage</tag> (good riddance!), <tag>ackno</tag>,
<tag>alt</tag>, and <tag>collabname</tag>.</para>
</listitem>
<listitem><para xml:id='p23'>Also <tag>segmentedlist</tag>.
</para>
</listitem>
<listitem><para xml:id='p24'><tag>link</tag>, <tag>olink</tag>,
<tag>ulink</tag>. Replaced by ubiquitous linking. Every element can have
either a <tag class="attribute">linkend</tag> attribute or an
<tag class="attribute">href</tag> attribute.
</para>
</listitem>
<listitem><para xml:id='p25'>Enumerated section elements (<tag>sect1</tag>,
<tag>sect2</tag>, <tag>refsect1</tag>, etc.). Again, these
exist because there was no other way to limit recursive depth in DTDs. In
RELAX NG, you can do it without forcing the author to think about the element
names.
</para>
</listitem>
</itemizedlist>

<para xml:id='p26'>Too aggressive, perhaps. Or not aggressive enough. Certainly not a finished,
final list.</para>

</section>

<section xml:id='s5'>
<title>Miscellany</title>

<para xml:id='p27'>Finally, I've made some organizational changes. Some of these are documented
as future use changes in V4.0, some are not.</para>

<para xml:id='p28'>In no particular order:</para>

<itemizedlist>
<listitem>
<para xml:id='p29'>The components of a personal name (<tag>firstname</tag>,
<tag>surname</tag>, etc.) are no longer allowed free-standing. You have
to wrap them in a <tag>personname</tag>.</para>
</listitem>

<listitem>
<para xml:id='p30'>I've explicitly allowed both CALS and HTML table models. RELAX NG lets us
segregate them so there's no overlap: it's exactly one or exactly the other.
Perhaps HTML tables should (also or only?) be allowed in the XHTML namespace?</para>
</listitem>

<listitem>
<para xml:id='p31'>I removed the <tag class="attribute">format</tag> attribute from
verbatim environments.</para>
</listitem>

<listitem>
<para xml:id='p32'>I dropped the <tag class="attribute">class</tag> attribute from
<tag>productname</tag>.</para>
</listitem>

<listitem>
<para xml:id='p33'>I made <tag>title</tag> mandatory on <tag>equation</tag>.
</para>
</listitem>

<listitem>
<para xml:id='p34'>I removed the <tag class="attribute">srccredit</tag> attribute
from <tag>imagedata</tag> and friends. Those elements now allow
<tag>info</tag> and the credit can more properly go there.</para>
</listitem>

<listitem>
<para xml:id='p35'>I removed <tag>contrib</tag>, use <tag>othercredit</tag>
instead.</para>
</listitem>
</itemizedlist>

</section>

<section xml:id='s6'>
<title>A Prototype</title>

<para xml:id='p36'>All this work resulted in a <link xlink:href="examples/docbook.rnc">prototype</link>
and a <link xlink:href="examples/convert.xsl">stylesheet</link> that converts (some) DocBook V4.2
documents to conform to the prototype.</para>

<para xml:id='p37'>One important change that I haven't made (<emphasis>yet</emphasis>) is
putting DocBook in a namespace. But we should.</para>

</section>

</essay>
