<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       version="pto">
<info>
<title>Annotation Markup</title>
<volumenum>7</volumenum>
<issuenum>166</issuenum>
<pubdate>2004-09-16T07:37:39-04:00</pubdate>
<date>$Date: 2005-10-17 09:59:18 -0400 (Mon, 17 Oct 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2004</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>A few days ago, I demonstrated some experimental annotation
formatting. What I didn’t really do was talk about the source markup
for annotations. This essay attempts to address that issue.
</para>
</abstract>
</info>

<para xml:id='p1'>A few days ago, I demonstrated some experimental
<link xlink:href="/2004/09/10/annotations">annotation formatting</link>. What I didn’t
really do was talk about the source markup for annotations.
From our discussions at the DocBook <acronym>TC<alt>Technical
Committee</alt></acronym>
<link xlink:href="http://lists.oasis-open.org/archives/docbook/200409/msg00023.html">meeting
yesterday</link>, it’s pretty clear there are a few ways to slice the problem.</para>

<para xml:id='p2'>Annotations divide, functionally, into two classes:
“inline” and “block”. Inline annotations are one possible solution for
a set of accessibility issues: providing alternate text for graphical
elements and other places where short, basically text-only, “tool tip
style” rendering is appropriate. Expansions for abbreviations and
acronyms and perhaps translations for <phrase
annotations="ann.fp"><tag>foreignphrase</tag>s</phrase> fall into this
category.</para>

<annotation xml:id="ann.fp">
<para xml:id='p3'>The <tag>foreignphrase</tag> element can be used to
markup the text of a foreign word or phrase. <quote>Foreign</quote> in
this context means that it is a language other than the primary
language of the document and is not intended to be pejorative in any
way.</para></annotation>

<para xml:id='p4'>Block annotations are more like footnotes. They can contain a range
of block elements (paragraphs, lists, even tables) and have to be rendered
in some other way, they aren’t “tool tips”.</para>

<para xml:id='p5'>For my experiments, I chose a single <tag>annotation</tag> element and
used an attribute to distinguish between the two classes:</para>

<programlisting>…DocBook &lt;acronym>TC&lt;annotation class="inline">Technical
Committee&lt;/annotation>&lt;/acronym>…</programlisting>

<para xml:id='p6'>The class values “inline” and “block” aren’t really suitable for
production use, they were just convenient for my experiments. In real life,
I think we’ll want to use things like “title” or even “expansion”. I don’t
know exactly what the list should be though.</para>

<para xml:id='p7'>One question is whether the functional distinction in annotations should be exposed
to the author by using two elements, analogous to <tag>inlinemediaobject</tag>
and <tag>mediaobject</tag>. The problem is, that’s not a very good analogy.
It’s true that the two forms of media object have different processing expectations,
but <tag>inlinemediaobject</tag> exists as a distinct element in order to
<emphasis>prevent</emphasis> “block” media objects from appearing in inline
contexts (like titles and phrases)<footnote><para xml:id='p8'>The
<tag>inlinemediaobject</tag> element is
allowed basically everywhere text is because it’s the hook that authors can
use to deal with glyphs that aren’t in the fonts they use, or even in Unicode.</para>
</footnote>.
Either style of annotation, on the other hand, can occur in almost all contexts.
Certainly block annotations can, in general, occur in inline contexts.</para>

<para xml:id='p9'>A better analogy, in some sense, is between <tag>para</tag> and
<tag>simpara</tag>, where <tag>para</tag> can contain block-level markup
(like <tag>programlisting</tag> and <tag>table</tag>) where <tag>simpara</tag>
cannot. But my experience is that authors don’t use <tag>simpara</tag> very
much unless customizers force them to do so.</para>

<para xml:id='p10'>So, one of the tensions we face is over the annotation element’s
name(s). Is there significant value in terms of explaining to authors
and getting them to use it correctly if there are two elements for
this purpose? Or does the fact that these two elements would
<emphasis>almost</emphasis> always appear together in every content
model mean that it makes more sense to have a single element and
distinguish between the inline and block cases simply by the element’s
content?</para>

<para xml:id='p11'>It turns out that there’s a precedent for that too. When we needed to provide
alternate text for media objects, we observed that we already had a
<tag>textobject</tag> element in the content model for media objects, so we extended
it to allow either <tag>phrase</tag> or block level markup and decreed that the
processing expectation for a text object containing only a phrase was that it
was the “inline” alternate text for the image. (A text object containing block
level markup might be used as the long description of an image.)</para>

<para xml:id='p12'>In retrospect, I’m not sure this was a really good idea. The markup is
a bit awkward and the distinction is pretty subtle to explain. Part of the awkwardness
of the content model:</para>

<programlisting>&lt;!ELEMENT textobject (phrase|(para|orderedlist|...))&gt;</programlisting>

<para xml:id='p13'>is related to shortcomings in XML DTD constraints. When I built my
experimental annotation markup, I left out the clumsy “phrase or blocks”
dichotomy choosing instead a simple:</para>

<programlisting>element db:annotation {
   ((text|inlines) | blocks)+
}</programlisting>

<para xml:id='p14'>model. This won’t translate well to a DTD, but I
haven’t decided if I care or not. Certainly forcing an extra
<tag>phrase</tag> in there just to avoid <phrase
annotations="ann.pmc">pernicious mixed content</phrase> seems like a
burden.</para>

<annotation xml:id="ann.pmc">
<title>Pernicious Mixed Content</title>
<para xml:id='p15'>A content model that allows a mixture of #PCDATA
and block elements exhibits a nasty peculiarity that we call
“pernicious mixed content”. In SGML, it manifested as markup errors.
In XML the rules for content models were changed and the errors can’t
occur any more, but a content model that exhibits this problem can’t
prevent unadorned text content from occurring in places where it
shouldn’t be allowed, such as between paragraphs.</para></annotation>


<para xml:id='p16'>Then there’s the unanswered question of where annotations should be allowed.
I’m inclined to think the correct answer is “everywhere text is allowed”.
<personname><firstname>Dick</firstname><surname>Hamilton</surname></personname>
pointed out one place were we might not allow block annotations: inside link
elements. The point being that a hypertext link inside a hypertext link is going
to be difficult to render and potentially confusing to authors and readers.
Maybe more experience will suggest other limits.</para>

<para xml:id='p17'>I suppose the next thing to do is think more explicitly about the
processing expectations for annotations. In my experiments, I’ve only allowed
them in a few places. Allowing them everywhere, and processing them in a reasonable
way wherever they occur, may be a significant processing challenge.</para>

</essay>
