XML-ER

Volume 15, Issue 6; 20 Feb 2012

The XML Error Recovery Community Group is up and running. And my spirits are raised just the tiniest amount about the future of XML on the web.

At XML Prague, Anne van Kesteren presented his work on “XML5”. Kudos to Anne for agreeing to present his work to an audience that might have been perceived as hostile.

It's an idea whose time has come, I think. Between his presentation and the first round of drinks that evening, a small group (including Jeni, Robin, and Anne, at least) had worked out with Liam that the right forum to do this in was a W3C Community Group. Anne agreed to do the hard work (edit the document), so I agreed to chair.

The next morning, a few clicks was all it took to spin up the XML Error Recovery Community Group. We've started to craft a charter, requirements, and an issues list. This morning, Anne announced the first draft.

All in all, we're off to a good start, I think. You're welcome to join and help us along.

Comments

I can't (yet) join the CG due to W3C process issues with $EMPLOYER, but I continue to think that a schema-free (or more accurately, document-type-insensitive) process is Just Wrong. HTML is a single document type, and we have a very strong signal about what users expect: they expect their broken documents to be treated the way that mainstream browsers already treat them. But across all possible XML document types, we have little information about what users expect.

I grant the conversion of character entities is probably right, as is replacement of invalid or invalidly coded characters. But if the document is structurally broken, which surely is the interesting case, we cannot know how to repair it without knowing what it is supposed to look like in some sense. Otherwise, error recovery is another name for eliminating syntactic errors, which are easily detected, in favor of semantic errors, which are not so easy to find.

The first-draft link points to a plain text document in the form of a Unix directory listing. Pkease fix.

—Posted by John Cowan on 20 Feb 2012 @ 05:01 UTC #

Link fixed. Sorry about that.

—Posted by Norman Walsh on 20 Feb 2012 @ 07:18 UTC #