Canonical XML and xml:id

Volume 8, Issue 119; 14 Sep 2005; last modified 08 Oct 2010

It's not really news anymore, but xml:id is now a Recommendation. Alas, it's just a little too early to declare victory. The job won't be completely finished until we address XML Canonicalization.

A program is only complete when its last user is dead.

It's not really news anymore I suppose, but xml:id Version 1.0 is now a Recommendation. Alas, it's just a little too early to declare victory. The job won't be completely finished until we address the problems revealed in Canonical XML (C14N).

In case you aren't an XML hack, or you happened not to have noticed the discussion, the problem in brief is this: C14N specifies that when you canonicalize a document subset, “attributes in the xml namespace, such as xml:lang and xml:space” are inherited. That means if you start with a document like this one:

<doc>
  <wrapper xml:space="preserve">
    Some content.
    <nested>More content</nested>
  </wrapper>
</doc>

and apply C14N to the element named “nested”, its canonical form is:

<nested xml:space="preserve">More content</nested>

That works fine for xml:lang and xml:space which have inheritable semantics. It doesn't work at all for xml:id: an ID is associated with a particular element, it does not inherit.

Anyway, whether you think xml:id was a bad idea or not is water under the bridge at this point. Critically, for me anyway, is the fact that C14N was already broken before xml:id came along. The xml:base attribute doesn't inherit in the same way as xml:lang and xml:space either and XML Base was already a Recommendation when we started working on xml:id.

The open question at this point is, how should C14N be fixed? For xml:id and any other attributes that may be added to the xml namespace in the future, the answer is clear: C14N must not treat them as inheritable. What's less clear to me is what C14N should do with xml:lang, xml:space, and xml:base. I think it breaks down into two cases, each with two possibilities:

  1. How should xml:lang and xml:space be handled?

    1. They should not be inherited.

    2. They should be inherited just as they are now.

  2. How should xml:base be handled?

    1. It should not be inherited.

    2. It should be subjected to “fixup” and then inherited. By fixup, I mean that the correct absolute value should be used, rather than the literal value of the attribute as the current specification indicates.

We talked about this a little bit on the XML Core WG telcon today and I think Daniel convinced me that the right answers are “1b” and “2b”; his argument being the principle of least surprise.

What do you think the right answers are?

Comments

1b/2b sound reasonable to me.

—Posted by Oleg Tkachenko on 15 Sep 2005 @ 10:09 UTC #

Yes, 1b/2b seems natural. And xml:id should of course not be inherited. But how should xml:somethingnew be handled? Should there be a new version of the C14N specification each time a new attribute in the xml namespace is defined?

(This seems to be another argument for an inclusive XML specification, where all attributes in the xml namespace, C14N, and some other stuff, is defined in the same specification.)

—Posted by Rasmus Kaj on 15 Sep 2005 @ 10:29 UTC #

What does "fixup" mean -- can you provide an example or two? I am concerned about adding URL/URI processing to a canonicalizer. I note that no other imported "xml:" values are modified or otherwise fixed-up.

Since xml:id will clearly not be imported, there's precedent to not import xml:base. It seems that the cleanest way to update C14N is to remove the "magic import" clause for attributes not already mentioned (space and lang).

How often is C14N used by itself, as opposed to feeding it into a digest mechanism? How often is C14N used on an XML subset, by itself, as opposed to feeding it into a digest mechanism? My guess is not very, and rarely. In that case, "fixing up" attribute values is of no use.

—Posted by Rich Salz on 15 Sep 2005 @ 05:41 UTC #

I think the issue with xml:base is something like this:

      <foo xml:base="http://example.com/">
  <bar xml:base="shoes/">
    <baz href="socks"/>
  </bar>
</foo>
    

If I pull out the baz element, simple inheritance adds attribute xml:base="shoes/", which is incomplete. The effective XML-base value for the baz element is http://example.com/shoes/.

I suspect this is a general problem with the xml-namespace attributes: they are often used to annotate a subtree of the XML document with some new trait whose inheritance rules the canonicalization recommendation cannot be expected to anticipate. Tricky!

—Posted by Damian Cugley on 16 Sep 2005 @ 02:45 UTC #

I think that the may be a few different patterns that could be applied to the canonicalisation of xml: namespaced attributes. Inherited or not is a pattern that's mentioned here. With respect to URI values there's also whether or not they should be 'fixed' up.

A future rev to C14N could call out various attribut 'canonicalisation' patterns, grandfather existing attributes by stating which characteristics apply to which xml: attributes. It could also place an obligation on those creating new names in the xml: namespace to state which canonicalisation pattern applies to the new attribute. That at least avoids the C14N spec. maintenance problem (until a new pattern is required). However, it does not avoid the need to maintain implementations - particularly if a default pattern is stated for the treatment of 'unknown' attributes - and the new attribute requires non-default treatment.

BTW: it also seems to me that describing the scope of the influence of a given attribute might be something that a schema could usefully do.

—Posted by Stuart Williams on 26 Sep 2005 @ 04:28 UTC #