The proposed edited recommendation of Extensible Markup Language (XML) 1.0 (Fifth Edition) is now out for review. The review period is long, lasting until 16 May, because one of the proposed changes is significant.
A couple of weeks ago, I poked a little fun at the SGML specification for introducing new appendixes and new parsing rules as “corrigenda”. Now it's my turn to be on the poking end. The XML Core WG is proposing to change the repertoire of characters allowed in XML names as an “erratum”.
Before the fifth edition, XML 1.0 was explicitly based on Unicode 2.0. As of the fifth edition, it is based on Unicode 5.0.0 or later. This effectively allows not only characters used today, but also characters that will be used tomorrow.
One of the real strengths of XML from the very beginning was that it required processors to support Unicode. This made XML, and all XML processors, international. But as Unicode has been extended to support languages written in Cherokee, Ethiopic, Khmer, Mongolian, Canadian Syllabics, and other scripts, XML 1.0's explicit use of Unicode 2.0 has prevented it from growing as well. That's a problem that XML must fix if it wants to continue to be regarded as a universal text format.
The working group's first attempt to address this problem, XML 1.1, has been largely unsuccessful. For a variety of reasons, XML 1.1 did more than the minimum needed to declare victory and some of that “more” makes it backwards incompatible with XML 1.0. So it was D.O.A.
The fifth edition does not change the status of any existing XML 1.0 document with respect to well-formedness or validity. Nor does it introduce any of the backwards-incompatible changes introduced in XML 1.1.
It isn't entirely without pain, unfortunately. Even if we imagine that all parsers will be updated to reflect the fifth edition (and it's possible to be optimistic on this point as it actually makes parsers smaller and simpler) eventually, there will be some period of time in which your (fourth edition) parser might reject my (fifth edition) document.
The XML Core WG is taking the position that the benefits of
extending XML 1.0 in this way outweigh the costs imposed by the
change. It remains to be seen if the community will agree. Bear in
mind that this sort of change isn't entirely unprecedented, we
from the relevent RFCs and we tinkered with the specific version of
Unicode 3 referenced. That said, this is still a much more substantial
Personally, I'm concerned about making this large a change as an erratum. But I'm persuaded that our other options: do nothing or attempt to introduce some other, new version of XML are worse.
Are you? Tell us.