Superseded by Annotations Revisited, 02 Dec 2004.

Annotation Markup

Volume 7, Issue 166; 16 Sep 2004; last modified 08 Oct 2010

A few days ago, I demonstrated some experimental annotation formatting. What I didn’t really do was talk about the source markup for annotations. This essay attempts to address that issue.

A few days ago, I demonstrated some experimental annotation formatting. What I didn’t really do was talk about the source markup for annotations. From our discussions at the DocBook TC meeting yesterday, it’s pretty clear there are a few ways to slice the problem.

Annotations divide, functionally, into two classes: “inline” and “block”. Inline annotations are one possible solution for a set of accessibility issues: providing alternate text for graphical elements and other places where short, basically text-only, “tool tip style” rendering is appropriate. Expansions for abbreviations and acronyms and perhaps translations for foreignphrasesThe foreignphrase element can be used to markup the text of a foreign word or phrase. “Foreign” in this context means that it is a language other than the primary language of the document and is not intended to be pejorative in any way. fall into this category.

Block annotations are more like footnotes. They can contain a range of block elements (paragraphs, lists, even tables) and have to be rendered in some other way, they aren’t “tool tips”.

For my experiments, I chose a single annotation element and used an attribute to distinguish between the two classes:

…DocBook <acronym>TC<annotation class="inline">Technical

The class values “inline” and “block” aren’t really suitable for production use, they were just convenient for my experiments. In real life, I think we’ll want to use things like “title” or even “expansion”. I don’t know exactly what the list should be though.

One question is whether the functional distinction in annotations should be exposed to the author by using two elements, analogous to inlinemediaobject and mediaobject. The problem is, that’s not a very good analogy. It’s true that the two forms of media object have different processing expectations, but inlinemediaobject exists as a distinct element in order to prevent “block” media objects from appearing in inline contexts (like titles and phrases)The inlinemediaobject element is allowed basically everywhere text is because it’s the hook that authors can use to deal with glyphs that aren’t in the fonts they use, or even in Unicode.. Either style of annotation, on the other hand, can occur in almost all contexts. Certainly block annotations can, in general, occur in inline contexts.

A better analogy, in some sense, is between para and simpara, where para can contain block-level markup (like programlisting and table) where simpara cannot. But my experience is that authors don’t use simpara very much unless customizers force them to do so.

So, one of the tensions we face is over the annotation element’s name(s). Is there significant value in terms of explaining to authors and getting them to use it correctly if there are two elements for this purpose? Or does the fact that these two elements would almost always appear together in every content model mean that it makes more sense to have a single element and distinguish between the inline and block cases simply by the element’s content?

It turns out that there’s a precedent for that too. When we needed to provide alternate text for media objects, we observed that we already had a textobject element in the content model for media objects, so we extended it to allow either phrase or block level markup and decreed that the processing expectation for a text object containing only a phrase was that it was the “inline” alternate text for the image. (A text object containing block level markup might be used as the long description of an image.)

In retrospect, I’m not sure this was a really good idea. The markup is a bit awkward and the distinction is pretty subtle to explain. Part of the awkwardness of the content model:

<!ELEMENT textobject (phrase|(para|orderedlist|...))>

is related to shortcomings in XML DTD constraints. When I built my experimental annotation markup, I left out the clumsy “phrase or blocks” dichotomy choosing instead a simple:

element db:annotation {
   ((text|inlines) | blocks)+

model. This won’t translate well to a DTD, but I haven’t decided if I care or not. Certainly forcing an extra phrase in there just to avoid pernicious mixed contentA content model that allows a mixture of #PCDATA and block elements exhibits a nasty peculiarity that we call “pernicious mixed content”. In SGML, it manifested as markup errors. In XML the rules for content models were changed and the errors can’t occur any more, but a content model that exhibits this problem can’t prevent unadorned text content from occurring in places where it shouldn’t be allowed, such as between paragraphs. seems like a burden.

Then there’s the unanswered question of where annotations should be allowed. I’m inclined to think the correct answer is “everywhere text is allowed”. Dick Hamilton pointed out one place were we might not allow block annotations: inside link elements. The point being that a hypertext link inside a hypertext link is going to be difficult to render and potentially confusing to authors and readers. Maybe more experience will suggest other limits.

I suppose the next thing to do is think more explicitly about the processing expectations for annotations. In my experiments, I’ve only allowed them in a few places. Allowing them everywhere, and processing them in a reasonable way wherever they occur, may be a significant processing challenge.


I like the idea of a single element, with or without structural content. Rationale, its simpler. Structure is only needed on occasion. Then I could go on at great length explaining (for instance) some geekish phrase/word, with a structured explanation, or simply expand on an acronym to its full meaning.
I'm not altogether happy about the markup being 'within' the item being discussed, but can't put forward any rational explanation, so I'll see how it pans out. Could be that I'm too used to seeing markup around the content being marked up?

—Posted by Dave Pawson on 16 Sep 2004 @ 06:48 UTC #

Dave I think that I can uderstand you position. I am also not completely reconciled with proposed syntax for annotation. Currently annotations are placed *inside* elements that are annotated. This make hard to distinct what is really annotated. I think that I will more like something like:

  <description>eXtensible Markup Language</description>
This approach explicitly markups annotation and object that is annotated. Processing will be also very easy. If you want to get text of abbreviation you simple ask for content of abbrev element. This very easy in contrast to structure proposed by Norm, where content of abbrev element contains both abbreviation and its annotation. So to extract abbreviation text you must concat text value of all nodes except nested annotation element nodes.

Of course my approach has also its drawbacks. Almost all of schema languages (except RNG I think) are unable to express that annotation/abbrev is accepted only on places where abbrev is accepted while annotation/ulink is permitted only on place where ulink is normally permitted and so on.

Maybe someone will bring some completly new miracle markup for annotations into a place. I hope.

—Posted by Jirka Kosek on 17 Sep 2004 @ 11:44 UTC #


You echo some of the core questions that I've had pertaining to DocBook in general. I will mention them here, although they are off-topic with respect to annotations. If you feel their discussion should be taken to the DocBook list, well, we can do that.

What is the purpose of the distinction between para and simpara? By a given author's choice, a para can reduce to a simpara (which happens fairly often), or can be semantically richer when needed. What's the semantic value of providing a named differentiation? The question extends to the other examples you highlight; if an img is adequate for HTML, and interpretable as being block or inline according to context, why can't DocBook do the same for mediaobject? Essentially, why can't its caption be rendered or not depending on the context (block or inline) of the mediaobject? Relating it back to annotations, why can't one serve both functions according to context? There may be (read: probably is) an obvious use-case here that I'm missing, but I'd like to see it and reflect upon it.

—Posted by John L. Clark on 18 Sep 2004 @ 01:02 UTC #