A Namespace for CALS Tables?

Volume 13, Issue 18; 12 May 2010; last modified 08 Oct 2010

CALS table markup is shared across many XML schemas. Does it make sense to create a namespace for this common vocabulary?

In the past few months, I've been approached independently (I believe) by two different groups exploring the possibility of creating a namespace for CALS table markup.

Namespace deniers and folks who believe that only one, centrally-managed vocabulary are required for the entire corpus of human discourse will recoil in horror (or ridicule), I'm sure.

I think it's an interesting idea. It would not be technically difficult to describe the vocabulary in a modern schema language. The historical difficulty with a common schema for CALS table markup is that the leaves (the contents of the cells themselves) need to be able to contain the “host vocabulary” markup. For example, a DocBook chapter (db:chapter) might contain a CALS table that ultimately contains a CALS table cell (cals:entry), but that cell must be able to contain a DocBook link (db:link).

What would be gained from a CALS table namespace?

  1. XML editors would be able to recognize tables in any vocabulary and switch to an appropriate table editor widget.

  2. XML viewers would be able to recognize tables in any vocabulary and display them as such. A web browser, assuming (perhaps foolishly) that future web browsers do something rational with namespaces, could have a built-in set of CSS rules for CALS table elements, for example.

  3. Reusable modules (for example, XSLT stylesheets) could be written for CALS tables and simply imported by vocabularies that support them.

  4. Putting CALS tables in a separate namespace would remove the semantic collision that occurs when CALS tables and other table models (for example, HTML) are both added to the same vocabulary.

  5. Authors, when encountering table-like elements in a vocabulary with which they are not familiar would be able to know for sure that the semantics of the markup really are the CALS semantics.

That's a pretty nice list. What would we lose?

  1. Well, first of all, there's a lot of legacy. That would either require a large backwards incompatible change in a short, sharp shock, or a long grace period when both the old and new markup are supported.

  2. There are plenty of folks out there who will argue that the right number of namespaces in a document is zero, that for authoring purposes, namespaces are too much trouble. If you can't have zero namespaces, having exactly one is the next best thing. In any event, more than one is too many.

    I don't subscribe to this point of view (Quelle surprise!). Almost all of my documents contain at least three namespaces (DocBook, XLink, and XInclude). But I am not unsympathetic to the view that authors have trouble with namespaces. On the other hand, if we think MathML and SVG are going to be widely used, then authors will get used to islands of “other namespace” markup in their documents; perhaps because authoring tools completely hide the fact, which they could also do with CALS table markup.

  3. To the extent that authors rely on non-namespace-aware tools to process their documents (yes, Viginia, it still happens), putting table markup in a namespace may be problematic.

Where does that leave us? If no table markup existed and we were inventing it from scratch, I'd be firmly in the “put it in a namespace” camp. As it is, I think it would be a hard sell in the community. I've been burned once, I'd be reluctant to take up such an effort without some indication that the community really wanted to go there.

Comments

The biggest advantage I see is that this gives an opportunity to standardize which subset of the original CALS table spec is supported. There doesn't seem to be an agreement there yet. The only standard subset is the Exchange subset, but I often see other subsets.

—Posted by Sjoerd Visscher on 12 May 2010 @ 01:44 UTC #

Am I the only one who'd never heard of CALS tables before this?

(But I did my googling...)

Re namespaces - I'm in the "I want to believe" camp.

How would this design compare to using/improving the existing XHTML markup for tables? Are these tables in the sense of presentation, or tables in a more SQL-ish sense, or most likely I suppose, a little of both.

If they're at the data end, there's a growing pile of tech (D2RQ etc) for mapping tabular SQL data into RDF. Having a nice clean namespace for this stuff would probably help apply similar tricks to markup...

—Posted by Dan Brickley on 12 May 2010 @ 02:33 UTC #
The "nested grammars" section of the RELAX NG tutorial (13 in the XML-syntax tutorial, 14 in the compact-syntax tutorial) explains how to do "doughnut" schemas that are independent of what can go in the doughnut hole. Here's the example schema:
cell.content = notAllowed
start =
  element table {
    element tr {
      element td { cell.content }+
    }+
  }
And here's the sample that includes it, overriding cell.content:
start =
  element doc {
    (element p { inline }
     | grammar {
         include "table.rnc" {
           cell.content = parent inline
         }
       })*
  }
inline =
  (text
   | element em { inline })*
—Posted by John Cowan on 12 May 2010 @ 04:28 UTC #

when reading your "What would be gained" list I couldn't help but think "That's what they said about xlink" so was somewhat amused to see at the end of the article that you admitted to having been "burned" before.

'On the other hand, if we think MathML and SVG are going to be widely used, then authors will get used to islands of “other namespace” markup'

They might, or they might vote with their feet and use html5 (or html5-style markup) where namespaces, well formedness and lots of other good things are brushed under the carpet.

AT NAG the internal markup to an in-house DTD uses CALS tables and MathML extensively but we don't use any namespaces internally. When generating pdf (with 3b2 - now arbortext print publisher) we process the cals directly, and for (x)html the stylesheets add xhtml or mathml namespaces as required while generating the public files.

—Posted by David Carlisle on 13 May 2010 @ 08:52 UTC #

I think the case is stronger for tables than links, but yeah, once bitten, twice shy. Perhaps all is already lost.

—Posted by Norman Walsh on 13 May 2010 @ 11:07 UTC #

I have always been a fan of namespaces. When I was working on the Elsevier DTD we included SOExt table into in its own namespace. We even put entry back into our namespace because it had its own content model, differing from the SOExt entry content model (as you indicate above). Because we used a DTD, we used some namespace binding trickery to get the CALS elements in the 'cals' namespace and get entry into our namespace, all without a prefix. I admit that few people understood this. (I am no longer working with Elsevier.)

—Posted by Simon Pepping on 16 Jun 2010 @ 07:00 UTC #

How would this affect RFE 2964576 ("disallow table in entry")? I regret being the user who "pointed out that the table entry element permits table in the entry element." (Regret, because I might have found that capability useful.)

—Posted by Mike Maxwell on 18 Jul 2010 @ 06:36 UTC #

Mike,

I don't think it's related. The CALS semantics forbid tables inside tables, all this essay considers is whether or not the CALS elements should be in a namespace.

As for the nesting issues, I think entrytbl or HTML tables are your only options. My guess is that nested tables are forbidden because circa-1980 formatters couldn't cope. Today they can, but short of revising CALS, I don't think we can do anything about it.

(That said, you are of course free to extend DocBook and allow it if your processing system can cope.)

—Posted by Norman Walsh on 20 Jul 2010 @ 02:04 UTC #