XML 2010: What's new in DocBook V5?

Volume 13, Issue 37; 20 Oct 2010

I spoke last week at XML 2010. In particular, I spoke about what's new in DocBook V5. Herewith, a few thoughts on XML 2010 and my slides.

The theme of the conference this year, “eMedia Revolution”, seemed like the sort of thing that ought to be very popular. Certainly, there are lots of interesting things being created and discussed in the realm of “eMedia”. Publishing for pads, tabs, and phones seems to be all the rage. I don't think every current trend is a good idea, but that's the sort of thing I was hoping to hear discussed.

Alas, attendance was, uhm, sparse. I had some good conversations and there were a few good talks, but on the whole, I was disappointed.

Any discussion about eMedia publishing involves, at least a tangentially, a discussion of modular content and reuse. That's the primary focus of the work that the DocBook Technical Committee is doing on DocBook V5.1, so I thought it would be good to pitch a presentation about our current progress.

My pitch was accepted and the slides are shown below, with a few comments.

What's new in DocBook V5?

Speaking of comments, wasn't this supposed to be a V5.1 talk? Well, yes. And it is, near the end, but several folks I spoke to before the conference suggested a general introduction would be appreciated. Oh, and despite a decade of working on standards committees, I still over estimate the speed at which progress will occur.

What is DocBook?

<foil foilnum="3">
  • DocBook is an XML vocabulary for writing. It is particularly well-suited to books and papers about computer software and hardware, though it is by no means limited to them.

  • It has been subset down to something that resembles HTML; it has been extended to do things as different as websites (docbook.org) and presentations (you're looking at one).

    • General publishing schema subcommittee

    • E-learning schema subcommittee (in haitus)

  • DocBook documents are mostly hand authored and mostly consumed by humans.

  • DocBook contains a lot of mixed content. Very few elements have “simple content,” dates, numbers, etc.

</foil>
<foil>
<book xmlns="http://docbook.org/ns/docbook"
      version="5.1">
<info>
<title>A Book Title</title>
<author>
  <personname>
    <givenname>John</givenname>
    <surname>Doe</surname>
  </personname>
</author>
</info>
<chapter>
<title>The First Chapter</title>
<para xml:id='p15'>Some <emphasis>text</emphasis>.</para>
</chapter>
</book>
</foil>

DocBook History

<foil foilnum="6">
  • DocBook has been actively maintained for more than a decade.

  • It was originally developed by the Davenport Group. It is now being developed by an OASIS Technical Committee.

  • DocBook was an SGML DTD for many years, then an XML DTD, and now officially a RELAX NG Grammar.

</foil>
<foil>

“DocBook is like a pearl, it grows by accretion.”

Plot of element growth by version
</foil>

The point on this slide is that V5 reduced the number of elements in DocBook. But it's really a setup for the next slide.

<foil>

“DocBook is like a pearl, it grows by accretion.”

Plot of entity growth by version
</foil>

I talked a little bit about DTDs and parameter entities. There aren't any in DocBook V5. Of course, there are RELAX NG patterns, but they don't come with the same level of complexity. Anyway, if you've never seen parameter entities, they look like this:

<foil>
<!ENTITY % chapter.module "INCLUDE">
<![%chapter.module;[
<!ENTITY % local.chapter.attrib "">
<!ENTITY % chapter.role.attrib "%role.attrib;">

<!ENTITY % chapter.element "INCLUDE">
<![%chapter.element;[
<!ELEMENT chapter %ho; (beginpage?,
                    chapterinfo?,
                    (%bookcomponent.title.content;),
                    (%nav.class;)*,
                    tocchap?,
                    (%bookcomponent.content;),
                    (%nav.class;)*)
		%ubiq.inclusion;>
<!--end of chapter.element-->]]>

<!ENTITY % chapter.attlist "INCLUDE">
<![%chapter.attlist;[
<!ATTLIST chapter
		%label.attrib;
		%status.attrib;
		%common.attrib;
		%chapter.role.attrib;
		%local.chapter.attrib;
>
<!--end of chapter.attlist-->]]>
<!--end of chapter.module-->]]>
</foil>
<foil>
  • Growth by accretion resulted in some content models that were at best odd and at worst broken in pretty obvious ways.

  • Over ten years, the scale of DocBook changed. Logically extending decisions that looked regular and consistent when DocBook had 100 elements did not always resulted in a design that continued to look regular and consistent.

In particular:

  • The DTD failed to capture some significant constraints.

  • Originally designed with exchange in mind, DocBook has largely become an authoring schema. Exchange and authoring aren’t opposing design centers, but they are different.

  • While DocBook was a shining example of parameter entity customization, parameter entity customization is fiendishly hard.

</foil>
<foil>

The result of recasting DocBook should…

  1. “Feel like” DocBook.

  2. Enforce as many constraints as possible.

  3. Clean up the content models.

  4. Give users the flexibility to extend or subset the schema in an easy and straightforward way.

  5. Still allow us to generate XML DTD and W3C XML Schema versions of DocBook.

</foil>
<foil>
  • Uniform info elements

  • Info elements in more contexts

  • Better constraint checking (e.g., titles inside/outside info)

  • Simple co-constraints

  • Untangled tables

  • (Ability to use) a few real datatypes

  • Extra-grammatical constraints (Schematron rules)

  • Hugely simplified customization

</foil>

Speaking of hugely simplified customization, my favorite DocBook DTD customization example: removing procedure elements.

<foil>
<!-- DocBook XML V4.5 No Procedures Subset -->

<!ENTITY % ebnf.block.hook "">
<!ENTITY % local.compound.class "">
<!ENTITY % compound.class
		"msgset|sidebar|qandaset
                 %ebnf.block.hook;
                 %local.compound.class;">

<!ENTITY % procedure.content.module "IGNORE">
<!ENTITY % task.content.module "IGNORE">
<!ENTITY % sidebar.element "IGNORE">
<!ENTITY % qandaset.element "IGNORE">
<!ENTITY % qandadiv.element "IGNORE">
<!ENTITY % question.element "IGNORE">
<!ENTITY % answer.element "IGNORE">
<!ENTITY % revdescription.element "IGNORE">
<!ENTITY % caution.element "IGNORE">
<!ENTITY % important.element "IGNORE">
<!ENTITY % note.element "IGNORE">
<!ENTITY % tip.element "IGNORE">
<!ENTITY % warning.element "IGNORE">

<!ENTITY % docbook.dtd PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
                       "http://docbook.org/xml/4.5/docbookx.dtd">
%docbook.dtd;

<!ENTITY % my.sidebar.mix
		"%list.class;		|%admon.class;
		|%linespecific.class;	|%synop.class;
		|%para.class;		|%informal.class;
		|%formal.class;
		|%genobj.class;
		|%ndxterm.class;        |beginpage
		%local.sidebar.mix;">

<!ELEMENT sidebar (sidebarinfo?,
                   (%formalobject.title.content;)?,
                   (%my.sidebar.mix;)+)>

<!ENTITY % my.qandaset.mix
		"%list.class;           |%admon.class;
		|%linespecific.class;	|%synop.class;
		|%para.class;		|%informal.class;
		|%formal.class;
		|%genobj.class;
		|%ndxterm.class;
		%local.qandaset.mix;">

<!ELEMENT qandaset (blockinfo?, (%formalobject.title.content;)?,
			(%my.qandaset.mix;)*,
                        (qandadiv+|qandaentry+))>

<!ELEMENT qandadiv (blockinfo?, (%formalobject.title.content;)?,
			(%my.qandaset.mix;)*,
			(qandadiv+|qandaentry+))>

<!ELEMENT question (label?, (%my.qandaset.mix;)+)>

<!ELEMENT answer (label?, (%my.qandaset.mix;)*, qandaentry*)>

<!ENTITY % my.revdescription.mix
		"%list.class;		|%admon.class;
		|%linespecific.class;	|%synop.class;
		|%para.class;		|%informal.class;
		|%formal.class;
		|%genobj.class;
		|%ndxterm.class;
		%local.revdescription.mix;">

<!ELEMENT revdescription ((%my.revdescription.mix;)+)>

<!ENTITY % my.admon.mix
		"%list.class;
		|%linespecific.class;	|%synop.class;
		|%para.class;		|%informal.class;
		|%formal.class;		|sidebar
		|anchor|bridgehead|remark
		|%ndxterm.class;        |beginpage
		%local.admon.mix;">

<!ELEMENT caution (title?, (%my.admon.mix;)+)
                      %admon.exclusion;>

<!ELEMENT important (title?, (%my.admon.mix;)+)
                      %admon.exclusion;>

<!ELEMENT note (title?, (%my.admon.mix;)+)
                      %admon.exclusion;>

<!ELEMENT tip (title?, (%my.admon.mix;)+)
                      %admon.exclusion;>

<!ELEMENT warning (title?, (%my.admon.mix;)+)
                      %admon.exclusion;>
</foil>

And here's how you do it in RELAX NG:

<foil>
# DocBook No Procedures Subset

include "docbook.rnc" {
   db.procedure = notAllowed
}
</foil>

DocBook Future

<foil foilnum="16">
  • Improve support for topic-based authoring

  • Fix small bugs as they arise

  • Identify backwards-incompatible changes to be fixed in V6.0

</foil>

There's should be something about transclusion on this slide (see below).

<foil>
  • “Traditional” (or narrative) authoring

    A narrative is a story that is created in a constructive format (as a work of writing, …) that describes a sequence of fictional or non-fictional events. It derives from the Latin verb narrare, which means "to recount" and is related to the adjective gnarus, meaning “knowing” or “skilled”… [B]ut can also be used to refer to the sequence of events described in a narrative.

    Wikipedia
  • Topic-based Authoring

    Topic-based authoring is a modular content creation approach (popular in the technical publications and documentation arenas) that supports XML content reuse, content management, and makes the dynamic assembly of personalized information possible.

    Wikipedia
</foil>
<foil>
  • DocBook is only be used for books

  • DocBook content can't be reused

  • DocBook can't support a topic-based writing methodology

  • Topic-based authoring produces better documentation

</foil>
<foil>
  • refentry

  • article

  • section

  • In DocBook V5.1: topic

</foil>

Document assembly

<foil foilnum="21">

Consider:

  • Four products on three platforms

  • with 250 topics

  • of which, about 200 are shared between any two products

Document assembly is the process by which each of the twelve products is composed from the library of 600+ topics.

</foil>
<foil>
  • XInclude

  • Entities

  • (Your favorite ad-hoc solution)

</foil>
<foil>
  • Requires careful (author) management of source documents

  • Inclusion mechanisms are inflexible; no support for

    • Multiple levels of hierarchy

    • More general transformation

    • Managing ID/IDREF values

</foil>
<foil>
  • Resources can be managed independently: topic authors don't need to understand how topics are composed to build products.

  • Automatic management of hierarchies

  • Possibly support for additional, general transformations

    • Build an assembly from a mixture of vocabularies

</foil>
<foil>
  • Combine a collection of resources

  • Using an (arbitrary) order and nesting suitable for the kind of output needed

  • To produce a valid DocBook document

    • Or a valid document in some appropriate customization

    • Explicitly: transforming the assembled document into the actual output format (PDF, eBook, web pages, help system) is the responsibility of another process.

</foil>

There are some folks that think this conceptual model, building an intermediate document that is then processed, is flawed. Eliot Kimber agreed. I'm likely to be pursuaded away from it, though I do like its conceptual simplicity.

<foil>
  • Define a vocabulary for declaring the assembly of topics:

<assembly xmlns="http://docbook.org/ns/docbook"
          version="5.1">
  <resources>…</resources>
  <structure>…</structure>
  <relationships>…</relationships>
  <transforms>…</transforms>
</assembly>
</foil>
<foil>
  • Identify the resources

  • Combine them into structures (i.e. products)

  • Describe additional relationships

  • Perhaps identify transformations

N.B. This presentation is based on DocBook V5.1 beta 2, some of the details are bound to change.

</foil>
<foil>
<resources xml:base="/some/base/">
  <resource xml:id="simple" fileref="simple.xml"/>

  <resource xml:id="topicA" fileref="topicA.xml"
            xpointer="element(A)"/>

  <resource xml:id="topicB" fileref="topicA.xml"
            xpointer="element(B)"/>

  <resource xml:id=”toc”>
    <toc/>
    <toc role=”procedures”/>
  </resource>
</resources>
</foil>
<foil>
simple => <article>...

topicA => <book>...<chapter>...<section xml:id="A">...

topicB => <book>...<part>...<topic xml:id="B">....

toc    => <toc/>, <toc role="procedures"/>
</foil>
<foil>
<structure xml:id="user-guide">
  <output renderas="book"/>
  <override>
    <title>Widget User Guide</title>
  </override>
  <module resourceref="toc"/>
  <module>
    <output renderas="chapter"/>
    <override>
      <title>Chapter Title</title>
    </override>
    <module resourceref="topicA"/>
    <module resourceref="topicB"/>
  </module>
</structure>
</foil>
<foil>
<book xml:id="user-guide">
  <info>
    <title>Widget User Guide</title>
  </info>
  <toc/>
  <toc role="procedures"/>
  <chapter>
    <info>
      <title>Chapter Title</title>
    </info>
    <section xml:id="A">...</section>
    <topic xml:id="B">...</topic>
  </module>
</structure>
</foil>
<foil>
<structure xml:id="user-guide">
  <output renderas="book"/>
  <override>
    <title>Widget User Guide</title>
  </override>
  <module resourceref="toc"/>
  <module>
    <output renderas="chapter"/>
    <override>
      <title>Chapter Title</title>
    </override>
    <module resourceref="topicA">
      <output renderas="section"/>
    </module>
    <module resourceref="topicB">
      <output renderas="section"/>
    </module>
  </module>
</structure>
</foil>
<foil>
<relationship type="seealso">
  <instance resourceref="tut1"/>
  <instance resourceref="tut2"/>
  <instance resourceref="task1"/>
</relationship>

<relationship type="path">
  <info>
    <title>New User Introduction</title>
  </info>
  <instance resourceref="over1"/>
  <instance resourceref="over2"/>
  <instance resourceref="task3"/>
  <instance resourceref="cleanup"/>
</relationship>
</foil>
<foil>
<transformations>
  <transform name="dita2docbook" type="text/xsl"
             fileref="dita2db.xsl"/>
  <transform name="tutorial" type="text/xsl"
             fileref="db2tutorial.xsl"/>
  <transform name="art2pi" type="text/xsl"
             fileref="art2pi.xsl"/>
  <transform name="office"
             type="application/xproc+xml"
             fileref="office2db.xpl"/>
  <transform name="office" type="text/xsl"
             fileref="extractoffice.xsl"/>
</transformations>
</foil>
<foil>
<resource xml:id="overview"
          fileref="dita/over.xml"
          transform="dita2docbook"/>

…

<module resourceref="overview">
  <transform name="art2pi"/>
</module>
</foil>
<foil>
  • An assembly declares the structure of a document

  • Interpreting an assembly requires processing a set of resources (documents or document fragments)

  • A structure is the concatenation of a set of modules, each of which may have been transformed

  • A module is the concatenation of a set of resources or other modules, each of which may have been transformed

  • In other words, an assembly describes a pipeline of operations that must be performed to build the assembled document

</foil>

I spoke about interpreting assemblies at XML Prague earlier this year.

<foil>
  • ID/IDREF management...

  • Profiling...

  • The DocBook TC is investigating the general problems of transclusion

    • Focus is on transclusion of DocBook documents

    • But the problem is clearly much broader

</foil>

The TC is also hard at work on the transclusion problem. Or, more accurately, Jirka Kosek has been hard at work. I think the TC will have some public drafts in this area very shortly.

I was running out of time so I didn't actually say much about it at the conference.

<foil>
</foil>

There was some lively conversation and, all in all, I thought the talk went pretty well.