<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       xmlns:foaf="http://xmlns.com/foaf/0.1/"
       xml:lang="en"
       version='5.0'>
<info>
<title>XML 1.0 (Fifth Edition)</title>
<volumenum>11</volumenum>
<issuenum>20</issuenum>
<pubdate>2008-02-07T12:23:19-05:00</pubdate>
<date>$Date$</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2008</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>The fifth edition of XML 1.0 is now a “proposed edited recommendation”.
New editions do little more than incorporate errata, hardly
newsworthy. This one is different.</para>
</abstract>
</info>

<para xml:id='p1'>The
<link xlink:href="http://www.w3.org/2005/10/Process-20051014/tr.html#q76">proposed edited recommendation</link> of
<link xlink:href="http://www.w3.org/TR/2008/PER-xml-20080205/">Extensible
Markup Language (XML) 1.0 (Fifth Edition)</link> is now out for review.
The review period is long, lasting until 16 May, because one of the proposed
changes is significant.</para>

<para xml:id='p2'>A couple of weeks ago, I
<link xlink:href="/2008/01/22/html5#p7">poked a little fun</link>
at the SGML specification
for introducing new appendixes and new parsing rules as
“<link xlink:href="http://www.onelook.com/?w=corrigenda&amp;ls=a">corrigenda</link>”.
Now it's my turn to be on the poking end. The
<link xlink:href="http://www.w3.org/XML/Core/">XML Core WG</link> is
<link xlink:href="http://www.w3.org/XML/xml-V10-4e-errata#E09">proposing</link>
to change the repertoire of characters allowed in XML names as an
“<link xlink:href="http://www.onelook.com/?w=erratm&amp;ls=a">erratum</link>”.
</para>

<para xml:id='p3'>Before the fifth edition, XML 1.0 was explicitly based on
Unicode 2.0. As of the fifth edition, it is based on Unicode 5.0.0
<emphasis>or later</emphasis>. This effectively allows not only
characters used today, but also characters that will be used
tomorrow.</para>

<para xml:id='p4'>One of the real strengths of XML from the very beginning was
that it required processors to support Unicode. This made XML, and all
XML processors, international. But as Unicode has been extended to
support languages written in Cherokee, Ethiopic, Khmer, Mongolian,
Canadian Syllabics, and other scripts, XML 1.0's explicit use of
Unicode 2.0 has prevented it from growing as well. That's a problem
that XML must fix if it wants to continue to be regarded as a
universal text format.</para>

<para xml:id='p5'>The working group's first attempt to address this problem,
<link xlink:href="http://www.w3.org/TR/xml11/">XML 1.1</link>, has been
largely unsuccessful. For a variety of reasons, XML 1.1 did more than
<emphasis>the minimum needed to declare victory</emphasis> and some of
that “more” makes it backwards incompatible with XML 1.0. 
So it was <link xlink:href="/2004/09/30/xml11">D.O.A.</link>
</para>

<para xml:id='p6'>The fifth edition does not change the status of
<emphasis>any</emphasis> existing XML 1.0 document with respect to
well-formedness or validity. Nor does it introduce
<emphasis>any</emphasis> of the backwards-incompatible changes
introduced in XML 1.1.</para>

<para xml:id='p7'>It isn't entirely without pain, unfortunately. Even if we
imagine that all parsers will be updated to reflect the fifth edition
(and it's possible to be optimistic on this point as it actually makes
parsers smaller and simpler) eventually, there will be some period of
time in which your (fourth edition) parser might reject my (fifth
edition) document.</para>

<para xml:id='p8'>The XML Core WG is taking the position that the benefits of
extending XML 1.0 in this way outweigh the costs imposed by the
change. It remains to be seen if the community will agree. Bear in
mind that this sort of change isn't entirely unprecedented, we
previously decoupled <tag class="attribute">xml:lang</tag> attributes
from the relevent RFCs and we tinkered with the specific version of
Unicode 3 referenced. That said, this is still a much more substantial
change.</para>

<para xml:id='p9'>Personally, I'm concerned about making this large a change
as an erratum. But I'm persuaded that our other options: do nothing
or attempt to introduce some other, new version of XML are worse.
</para>

<para xml:id='p10'>Are you? <link xlink:href="mailto:xml-editor@w3.org">Tell us</link>.
</para>

</essay>
