Canonical XML and xml:id
It's not really news anymore, but xml:id is now a Recommendation. Alas, it's just a little too early to declare victory. The job won't be completely finished until we address XML Canonicalization.
A program is only complete when its last user is dead.
It's not really news anymore I suppose, but xml:id Version 1.0 is now a Recommendation. Alas, it's just a little too early to declare victory. The job won't be completely finished until we address the problems revealed in Canonical XML (C14N).
In case you aren't an XML hack, or you happened not to have noticed
                  the discussion,
                  the problem in brief is this:
                  C14N specifies that when you canonicalize a document subset,
                  “attributes in the xml namespace,
                  such as xml:lang and xml:space”
                  are inherited. That means if you start with a document like this one:
<doc>
  <wrapper xml:space="preserve">
    Some content.
    <nested>More content</nested>
  </wrapper>
</doc>and apply C14N to the element named “nested”, its canonical form is:
<nested xml:space="preserve">More content</nested>That works fine for xml:lang and
                  xml:space which have inheritable semantics.
                  It doesn't work at all for
                  xml:id: an ID is associated with a particular
                  element, it does not inherit.
Anyway, whether you think xml:id
                  was a bad idea or not is water under the bridge at this point.
                  Critically, for me anyway, is the fact that C14N was
                  already broken before xml:id
                  came along. The xml:base attribute doesn't
                  inherit in the same way as xml:lang and
                  xml:space either and 
                  XML Base was already a
                  Recommendation when we started working on
                  xml:id.
The open question at this point is, how should C14N be fixed? For 
                  xml:id and any other attributes that may
                  be added to the xml namespace in the future, the answer is
                  clear: C14N must not treat them as inheritable. What's less clear to me
                  is what C14N should do with xml:lang,
                  xml:space, and
                  xml:base. I think it breaks down into two
                  cases, each with two possibilities:
- 
                        How should xml:langandxml:spacebe handled?- 
                                 They should not be inherited. 
- 
                                 They should be inherited just as they are now. 
 
- 
                                 
- 
                        How should xml:basebe handled?- 
                                 It should not be inherited. 
- 
                                 It should be subjected to “fixup” and then inherited. By fixup, I mean that the correct absolute value should be used, rather than the literal value of the attribute as the current specification indicates. 
 
- 
                                 
We talked about this a little bit on the XML Core WG telcon today and I think Daniel convinced me that the right answers are “1b” and “2b”; his argument being the principle of least surprise.
What do you think the right answers are?
Comments
1b/2b sound reasonable to me.
Yes, 1b/2b seems natural. And xml:id should of course not be inherited. But how should xml:somethingnew be handled? Should there be a new version of the C14N specification each time a new attribute in the xml namespace is defined?
(This seems to be another argument for an inclusive XML specification, where all attributes in the xml namespace, C14N, and some other stuff, is defined in the same specification.)
What does "fixup" mean -- can you provide an example or two? I am concerned about adding URL/URI processing to a canonicalizer. I note that no other imported "xml:" values are modified or otherwise fixed-up.
Since xml:id will clearly not be imported, there's precedent to not import xml:base. It seems that the cleanest way to update C14N is to remove the "magic import" clause for attributes not already mentioned (space and lang).
How often is C14N used by itself, as opposed to feeding it into a digest mechanism? How often is C14N used on an XML subset, by itself, as opposed to feeding it into a digest mechanism? My guess is not very, and rarely. In that case, "fixing up" attribute values is of no use.
I think the issue with xml:base is something like this:
<foo xml:base="http://example.com/"> <bar xml:base="shoes/"> <baz href="socks"/> </bar> </foo>If I pull out the
bazelement, simple inheritance adds attributexml:base="shoes/", which is incomplete. The effective XML-base value for thebazelement ishttp://example.com/shoes/.I suspect this is a general problem with the xml-namespace attributes: they are often used to annotate a subtree of the XML document with some new trait whose inheritance rules the canonicalization recommendation cannot be expected to anticipate. Tricky!
I think that the may be a few different patterns that could be applied to the canonicalisation of xml: namespaced attributes. Inherited or not is a pattern that's mentioned here. With respect to URI values there's also whether or not they should be 'fixed' up.
A future rev to C14N could call out various attribut 'canonicalisation' patterns, grandfather existing attributes by stating which characteristics apply to which xml: attributes. It could also place an obligation on those creating new names in the xml: namespace to state which canonicalisation pattern applies to the new attribute. That at least avoids the C14N spec. maintenance problem (until a new pattern is required). However, it does not avoid the need to maintain implementations - particularly if a default pattern is stated for the treatment of 'unknown' attributes - and the new attribute requires non-default treatment.
BTW: it also seems to me that describing the scope of the influence of a given attribute might be something that a schema could usefully do.