<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       xmlns:foaf="http://xmlns.com/foaf/0.1/"
       xml:lang="en"
       version='5.0'>
<info>
<title>XML 2.0? No, seriously.</title>
<volumenum>11</volumenum>
<issuenum>23</issuenum>
<pubdate>2008-02-20T10:17:24-05:00</pubdate>
<date>$Date: 2006-01-05 09:38:44 -0500 (Thu, 05 Jan 2006) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2008</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Maybe its madness to consider XML 2.0 seriously.
The cost of deployment would be significant.
Simultaneously convincing a critical mass of users to switch without
turning the design process into a farce would be very difficult.
And yet, the alternatives look a little like madness too.</para>
</abstract>
</info>

<epigraph>
<attribution><personname><firstname>B.</firstname>
<surname>Stroustrup</surname></personname></attribution>
<para xml:id='p2'>Design and programming are human activities;
forget that and all is lost.
</para>
</epigraph>

<para xml:id='p1'>I found three topics on my desk simultaneously last week.
</para>

<orderedlist>
<listitem>
<para xml:id='p3'>The proposal to amend
<link xlink:href="../07/xml105e">the character set of XML 1.0</link>
identifiers by erratum.
</para>
</listitem>
<listitem>
<para xml:id='p4'>the proposal to deploy
<link xlink:href="http://www.w3.org/TR/curie/">CURIEs</link>, an awkward,
confusing extension of the
<link xlink:href="http://www.w3.org/TR/xml-names/#ns-qualnames">QName concept</link>.
</para>
</listitem>
<listitem>
<para xml:id='p5'>A thread of discussion suggesting that we consider allowing
prefix undeclaration
in
<citetitle xlink:href="http://www.w3.org/TR/xml-names/">Namespaces in XML
1.0</citetitle>. That's right <emphasis>1.0</emphasis>.
</para>
</listitem>
</orderedlist>

<para xml:id='p6'>We're in an odd place.</para>

<para xml:id='p7'>XML has been more successful, and in more and more different
arenas, than could have been imagined. But…</para>

<para xml:id='p8'>XML 1.0 is seriously broken
in the area of internationalization, one of its key strengths, because
it hasn't kept pace with changes to Unicode.</para>

<para xml:id='p9'>QNames, originally designed as a way of creating qualified
element and attribute names have also been used in more and more
different arenas than could have been imagined. Unfortunately, the
constraints that make sense for XML element and attribute names, don't
make sense, are unacceptable, in many of the other arenas.</para>

<para xml:id='p10'>And in XML, we learned that it is sometimes useful to be able to
take a namespace binding out of scope.</para>

<para xml:id='p11'>XML 1.1 addressed some of these concerns, but also introduced
backwards incompatibilities. Those incompatibilities seemed justified
at the time, although they seem so obviously unnecessary and foolish
now. In short, we botched our opportunity to fix the problem “right”.
</para>

<para xml:id='p12'>What to do?</para>

<para xml:id='p13'>I think I could just about (have, even) accept any one of the
items on that list above. Fixing the Unicode problem in XML 1.0 by erratum
is stretching the definition of erratum to the breaking point, but by itself
is probably an acceptable compromise. Adding
pseudo-QName identifiers to the world is confusing and ugly, but by
itself probably not the worst thing that could be done. And allowing
XML 1.0 documents to undeclare namespace prefixes, by itself, seems sensible in
retrospect.</para>

<para xml:id='p14'>But all three? Really?</para>

<para xml:id='p15'>Perhaps, dare I say it, it is time to consider XML 2.0 instead.
Trouble is, if XML 2.0 gets spun up as an open-ended design exercise,
it'll be crushed by the
<wikipedia page="Second-system_effect">second-system effect</wikipedia>.
And if XML 2.0 gets spun up as “only” a simplification of XML 1.0,
it won't get any traction. If XML 2.0 is to be a success, it has to offer
enough in the way of new functionality to convince people with
successful XML 1.0 deployments (that's everyone, right?) that it's
worth switching. At the same time, it has to be about the same size
and shape as XML 1.0 when it's done or it'll be perceived as too big,
too complicated, too much work.</para>

<para xml:id='p16'>With that in
mind, here are some candidate requirements for XML 2.0.</para>

<orderedlist>
<listitem>
<para xml:id='p17'><emphasis>All</emphasis> well-formed XML 1.0 documents that do
not include an internal or external subset shall be well-formed XML
2.0 documents.</para>
<para xml:id='p18'>In other words, <emphasis>backwards compatibility</emphasis> for
well-formed XML documents!
But it's time to move all that DTD stuff off into
another specification. Maybe we can even add <code>&lt;!NAMESPACE</code>
in XML 2.0 DTDs. If that spec ever gets written.</para>
</listitem>
<listitem>
<para xml:id='p19'>The XML 2.0 specification shall be no longer than the XML 1.0
specification.</para>
<para xml:id='p20'>In other words, you can't add seventy-three new whiz-bang
features. You can't do <emphasis>anything</emphasis> that will require
more prose to explain than you can remove by taking out DTD syntax.</para>
</listitem>
<listitem>
<para xml:id='p21'>All XML 2.0 documents shall support XML Namespaces.
</para>
<para xml:id='p22'>In other words, what most of the XML world already requires. The
experiment is over, namespaces won. Like it or not.</para>
</listitem>
<listitem>
<para xml:id='p23'>XML 2.0 shall define a mapping from QNames to URIs.</para>
<para xml:id='p24'>In other words, <code>db:para</code> ≡
(<uri>http://docbook.org/ns/docbook</uri>, <literal>para</literal>)
≡ <uri>http://docbook.org/ns/docbook#para</uri>, by definition. (For
<literal>xmlns:db="http://docbook.org/ns/docbook"</literal>; and we can
argue about the precise mapping rules later.)
</para>
</listitem>
<listitem>
<para xml:id='p25'>XML 2.0
shall allow QNames to represent a broader range of values.
</para>
<para xml:id='p26'>In other words, <literal>isbn:1234</literal> is too useful to forbid.
But we're still not allowing it as the name of an element or attribute.
</para>
</listitem>
<listitem>
<para xml:id='p27'>XML 2.0 shall provide an unambiguous,
context-<emphasis>in</emphasis>sensitive lexical form for QNames.
</para>
<para xml:id='p28'>In other words, it will be possible to represent any XML 2.0
document without any namespace declarations at all. I've
<link xlink:href="http://norman.walsh.name/2004/11/10/xml20#p28">given
some thought</link> to how I think this might be done.</para>
</listitem>
<listitem>
<para xml:id='p29'>XML 2.0 shall do away with the requirement that
documents can have only a single root element.</para>
<para xml:id='p30'>In other words, make <literal>document = extParsedEnt</literal>.
Perhaps this is only a plausible requirement, but the fact is that many
tools, like XSLT, are already comfortable with such instances and I'm going to take
advantage of it in the next item.
</para>
</listitem>
<listitem>
<para xml:id='p31'>XML 2.0 shall address the problem of named character references.
</para>
<para xml:id='p32'>In other words, making it possible to write <literal>&amp;nbsp;</literal>
or <literal>&amp;Exists</literal> even in documents that don't have any
entity declarations. The actual notation wouldn't have to use
“<literal>&amp;</literal>” but it might as well.</para>
<para xml:id='p33'>I have in mind a proposal for this:</para>

<programlisting><![CDATA[
<xml:entity name="nbsp" text="&#160;"/>
<xml:entity name="Exists" text="∃"/>
<xml:entity href="myentities.xml"/>
<document>...</document>]]></programlisting>

<para xml:id='p34'>As a matter of simplicity, I'm pretty confident I want to treat
these new entities like the old ones, and like CDATA sections, and say that
they are purely an authoring convenience; they don't survive parsing. In fact,
I'm not even sure the parser has to report those elements, it can consume
them as it goes.</para>
<para xml:id='p35'>That means you
have to have a facility like XSLT 2.0's character maps to put them back
at serialization time, if you want them back. Yes, I know this is still
an inconvenience for some, but the alternative would require that all XML
tools grow support for entity reference objects and that seems
inconvenient for far more people.
</para>
</listitem>
</orderedlist>

<para xml:id='p36'>I think it is possible to address the requirements I've outlined
without doing undue violence to existing applications. From an API
perspective, I think the worst part will be dealing with QNames as
first-class objects. It will mean, for example, that attribute values
become lists. In the simple case, a list of one text node, but for
attributes that contain QNames (in their context-insensitive format),
a list of (text|QName)*.</para>

<para xml:id='p37'>In my optimistic moments, I imagine that XML 2.0
could thread the needle between insufficient value to motivate transition
and so much complexity that it can't possibily succeed. Though whether
a committee could thread this particular needle (with this particular
camel) is an open question.</para>

</essay>
