<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="pto" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
<title>Version Identifiers and XML</title><biblioid class="uri">http://norman.walsh.name/2004/12/15/xml11</biblioid>
<volumenum>7</volumenum>
<issuenum>213</issuenum>
<pubdate>2004-12-15T06:43:27-05:00</pubdate>
<date>$Date: 2006-10-05 22:38:13 -0400 (Thu, 05 Oct 2006) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2004</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>David Orchard says XML blew it. He's talking about XML 1.1, but
his beef isn't with the technical changes, it's with the version
number.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<para xml:id="p1"><personname>
      <firstname>David</firstname>
<surname>Orchard</surname>
    </personname> says <link xlink:href="http://www.pacificspirit.com/blog/2004/12/02/version_identifiers_best_practice_and_xml_blew_it">XML blew it</link>. He's talking about XML 1.1,
but his beef isn't with the technical changes, it's with the version
number.</para>

<para xml:id="p2">David has been working on a
<link xlink:href="http://www.w3.org/2001/tag/">TAG</link> finding on
<link xlink:href="http://www.w3.org/2001/tag/doc/versioning.html">
      <citetitle>Versioning
XML Languages</citetitle>
    </link>, so we've talked a lot
about many aspects of versioning. (Ostensibly, I'm working on the finding
too, but I've been busy on other things and recent technical
progress is all due to David.)</para>

<para xml:id="p3">But working together doesn't mean we always agree and I take
exception to a few of the things David says. Full disclosure: I'm on
the <link xlink:href="http://www.w3.org/XML/Core/">XML Core Working Group</link>
and I may be taking this all a bit personally.</para>

<para xml:id="p4">The first thing I note is a simple,
factual error. David says<footnote>
      <para xml:id="p5">I'm excerpting bits of his essay,
you'll probably want to read it all in context first. My essay started
as an email message to David, but he persuaded me that there was value
in carrying on the discussion “in public”.</para>
    </footnote>:</para>

<blockquote>
    <para xml:id="p6">XML
1.1 adds allowed characters - particularly control characters - to the
name production.</para>
  </blockquote>

<para xml:id="p7">That's just not true. Some of the C0 control characters
(0x01-0x1F) are now allowed (in numeric-escaped form only) in
character data where they were not previously allowed. They aren't
allowed in names. Additional alphabetic and ideographic characters are
allowed in names, however, so his point is still valid.</para>

<para xml:id="p8">And, alas, XML 1.1 is not backwards compatible
with XML 1.0. Not perfectly anyway. The C1 control characters
(0x80-0x9F) were accidentally allowed in character data in
XML <emphasis>1.0</emphasis>.
They are now forbidden except in their numeric-escaped forms.</para>

<blockquote>
<para xml:id="p9">To a great extent, XML 1.1 was called 1.1 because it was hoped that
this would help foster adoption, rather than xml 1.01 or xml 2.0. I
think it's particularly sad that XML has resorted to this kind of
marketing effort in it's version identifiers.</para>
</blockquote>

<para xml:id="p10">To which I am inclined to reply, “Oh, come on!”
None of the technical arguments David is making would change
if it was called XML 1.01 (or XML 1.0.1 which was actually proposed).
</para>

<para xml:id="p11">One can argue that it should have been called 2.0 because of its
tiny backwards incompatibility, and that's an entirely valid argument,
except that it ignores the fact that all specifications are developed
in both a social and a technical context.</para>

<blockquote>
<para xml:id="p12">Here's a rule that could help: Version identifiers should be
rigorously used to identify compatible or incompatible changes.</para>
</blockquote>

<para xml:id="p13">Yep, that would have made it 2.0. But users have non-technical
expectations about version numbers and it's not clear to me that the
community would have benefitted from the larger number. Maybe the
lesson here is that putting version numbers in the title of your
specification muddies the waters.</para>

<para xml:id="p14">The whole issue of backwards and forwards compatibility in a
vocabulary that has an explicit version number is an interesting one
anyway. Suppose this is an instance of the first version of some
vocabulary:</para>

<programlisting>&lt;transaction&gt;
  &lt;buy shares="1000"&gt;SUNW&lt;/buy&gt;
&lt;/transaction&gt;</programlisting>

<para xml:id="p15">If a second version of this vocabulary adds some optional element,
we can say that it's backwards compatible because the preceding instance
is still a valid, understandable instance.</para>

<para xml:id="p16">But can we say the same thing about this instance?</para>

<programlisting>&lt;transaction version="1.0"&gt;
  &lt;buy shares="1000"&gt;SUNW&lt;/buy&gt;
&lt;/transaction&gt;</programlisting>

<para xml:id="p17">If the second version mandates “<literal>version="2.0"</literal>”,
then the preceding message isn't really a valid second version instance.
It's trivial to transform it into one, and the semantics are exactly
what you'd expect, but that's not quite the same
thing, is it?</para>

<para xml:id="p18">It's trivial to transform XML 1.0 documents into XML 1.1 documents
too, and the semantics are exactly what you'd expect:</para>

<orderedlist>
<listitem>
<para xml:id="p19">If there's an XML declaration in the version 1.0 document,
replace “<literal>version="1.0"</literal>” with
“<literal>version="1.1"</literal>”. I there isn't an XML declaration,
add one with “<literal>version="1.1"</literal>”.</para>
</listitem>
<listitem>
<para xml:id="p20">If the document contains any unescaped C1 control characters,
escape them.</para>
</listitem>
</orderedlist>

<para xml:id="p21">Given that you <emphasis>have</emphasis> to touch the document
to make it an XML 1.1 document (because XML 1.0 does mandate version
1.0 in the XML declaration, even if the declaration is implicit) and
given that no real world documents actually use the C1 control
characters (outside of test suites), I think it's a stretch to say XML
blew it.</para>

<para xml:id="p22">David goes on to explain:</para>

<blockquote>
<para xml:id="p23">The reason is that the XML 1.0 has very few extensibility points that
allow for compatibility, and name characters are not one of these
extensibility points. XML 1.0 decided that any extension, like a
control character, results in a fault. It does not have any way of
dealing with the extensions that doesn't result in a fault. If it had
a substitution model for name extensions, then the XML 1.1 names could
be understood by XML 1.0 processors.</para>
</blockquote>

<para xml:id="p24">I don't believe any practical, sensible substitution model could
have been devised. The fact that the changes made were not forward
compatible is just a fact of life. The fact that they're not backwards
compatible is unfortunate. That was probably a mistake: a very minor
technical one and an apparently larger political one.</para>

<blockquote>
<para xml:id="p25">There's an axiom that emerges: Forward compatible extensions can only
be done if a substitution model for the extensions exist.
</para>
<para xml:id="p26">If XML 1.1 (sic) had provided a substitution model, like the
must ignore unknowns, then XML 1.1 truly would be compatible with XML
1.0.</para>
</blockquote>

<para xml:id="p27">David, are you suggesting that “must ignore” would have been the
slightest bit reasonable in this context? That suggests that an XML 1.0
processor should treat <literal>&lt;a<inlinemediaobject>
<imageobject>
<imagedata fileref="examples/U1230.gif" width="0.8em"/>
</imageobject>
<textobject>
<phrase>Ethiopic U1230</phrase>
</textobject>
</inlinemediaobject>b&gt;</literal>
as if it were <literal>&lt;ab&gt;</literal> which strikes me as ludicrous.
(That's an Ethiopic &amp;#x1230; in the first name.)
</para>

</essay>

