Here are the slides that I presented at XML Prague 2011.

Below are the slides that I presented at XML Prague this morning, with a few snippes of commentary before each one. I'm not sure how valuable they are, but… You may get more milage out of just reading the draft report.

I don't have conclusions yet, so I'll ask for help instead.


  • History

  • Use Cases

  • Conclusions

  • Ask for your help

Foil #1

I don't claim to be speaking for anyone. I'll try not to say inflamatory things, but my biases may show.


Facts are facts. But any opinions expressed are the opinions only of myself and may or may not reflect the opinions of anybody else with whom I may or may not have discussed the issues at hand.

Jim Melton

In particular, I do not claim that what follows is necessarily the consensus opinion of the HTML/XML Task Force.

Foil #2

In the beginning, there was SGML.

History (In the beginning…)

Foil #3

HTML came from SGML. Sort of. Although it was specified as an SGML application, it was never broadly implemented that way.

History (HTML)

Foil #4

XML did really come from SGML. You can parse XML with an SGML parser if you fiddle the SGML declaration in the right way.

History (XML)

Foil #5

The next logical step was to combine HTML and XML.

History (XHTML)

Foil #6

That might have formed the basis of HTML5.

History (Alternative history 1)

Foil #7

But it didn't.

History (Bzzzt! No! …)

Foil #8

Alternatively, we could have adjusted XML to better meet the HTML use cases.

History (Alternative history 2)

Foil #9

Alas, the window of opportunity for this plan has passed. Maybe it never really existed.

History (Bzzzt! No! …)

Foil #10

What we have instead is divergent evolution. One of ideas that arose at least indirectly out of the task force discussions was a proposal for MicroXML. I don't know what might come of that.

Present day

Foil #11

If you think XML and HTML are both important, this is…unfortunate.

What's wrong?

Perhaps the biggest challenge that faces the W3C's technical work on the Web is the growing chasm between HTML and XML

T. V. Raman
  • TAG Issue-67: HTML and XML Divergence

    • Tag soup

    • Namespaces

    • Syntactic differences (quoted attribute values)

    • DOM differences (<tbody> insertion)

    • Distributed extensibility

Foil #12

Of these problems, the DOM differences and distributed extensibility are the most troubling. Here's an valid HTML5 document (I think) that's also a well-formed XML document.

No bonus points to this audience for guessing what the XML DOM looks like.

DOM differences (markup)

Foil #13

But what about the HTML5 DOM? As you can see, the HTML5 parser injects a required <tbody> element. So you can't get the same DOM even if you use polyglot markup.

DOM differences (DOM)

Foil #14

The other big problem is distributed extensibility. The HTML5 WG has decided it won't have any. That has implications for groups, both inside and outside the W3C that might want to create extensions.

In fairness, the distributed extensibility mechanism that XML provides, namespaces, are not well loved. Me, I like them just fine, but there always has to be one weirdo in the group.

It's also worth observing that there are in principle objections to extensibility because it impacts interoperability. I can see those arguments. I don't agree with them, but I can see them.

Distributed extensibility

  • In practice

    • SVG in W3C and HTML WG

    • RDFa in W3C not in HTML WG

    • FBML not in W3C

  • In practice

    • Namespaces

  • In principle

Foil #15

Against this history and background, the W3C Technical Architecture Group formed an HTML/XML Task Force.


  • Robin Berjon

  • Michael Champion

  • James Clark

  • John Cowan

  • Michael Kay

  • Anne van Kesteren

  • Yves Lafon (staff contact)

  • Noah Mendelsohn

  • Henri Sivonen

  • Norman Walsh (chair)

Foil #16

We looked at several use cases, starting with processing HTML with an XML toolchain.

Use Case: Consume HTML

How can an XML toolchain be used to consume HTML?

  • Author HTML5 with polyglot markup.

  • Add an HTML5 parser to the front end of your toolchain.

    • Doesn't solve the pernicious “document.write” problem or other script-related problems.

    • But short of running a JavaScript engine on the content, nothing is likely to solve those problems.

Foil #17

And its logical converse, processing XML with an HTML toolchain.

Use Case: Consume XML

How can an HTML5 toolchain be used to consume XML?

  • HTML5 tools won't be designed to deal with arbitrary element names in arbitrary namespaces.

  • Transforming to HTML5 is probably the best route.

    • Even a partial transformation to remove namespaces, PIs, etc. might prove valuable.

  • It's probably best not to encourage users to imagine this will be broadly successful.

Foil #18

Embedding HTML in XML.

Use Case: Embed HTML

How can islands of HTML be embedded in XML?

  • Use the XML serialization of HTML5.

  • Escape the markup.

  • Rely on more sophisticated multipart-message handling systems.

Regardless, some care may be necessary. How are the HTML islands going to be processed? By “clipping” them out and processing them with an HTML5 tool, or by passing the whole DOM to the tool?

Foil #19

And its logical converse.

Use Case: Embed XML

How can islands of XML be embedded in HTML?

  • The HTML5 parser interprets unfamiliar markup as an error and corrects for it.

  • Correction can include changing the order and nesting of elements.

  • Practically speaking: you can't embed a “naked” island of XML in HTML5.

Foil #20

You can clothe it in <script>. Yeah, the name's a bit of a shame, but for legacy reasons…

Use Case: Embed XML

Putting clothes on your XML

  1<script type="application/xml">
  2  <data>
  3    <title>Your XML</title>
  4    <gpx xmlns="">
  5      <wpt lat="50.077484" lon="14.443800">
  6        <ele>200.08</ele>
  7        <time>2007-01-06T17:33:04Z</time>
  8        <name>001</name>
  9        <sym>Restaurant</sym>
 10      </wpt>
 11    </gpx>
 12  </data>
Foil #21

Note that the content of a <script> element is CDATA. All those things that look like elements are actually escaped text.

Use Case: Embed XML

Belt and suspenders

  1  &lt;data&gt;
  2    &lt;title&gt;Your XML&lt;/title&gt;
  3    &lt;gpx xmlns=""&gt;
  4      &lt;wpt lat="50.077484" lon="14.443800"&gt;
  5        &lt;ele&gt;200.08&lt;/ele&gt;
  6        &lt;time&gt;2007-01-06T17:33:04Z&lt;/time&gt;
  7        &lt;name&gt;001&lt;/name&gt;
  8        &lt;sym&gt;Restaurant&lt;/sym&gt;
  9      &lt;/wpt&gt;
 10    &lt;/gpx&gt;
 11  &lt;/data&gt;
Foil #22

Finally, it's important to note that just because HTML5 doesn't have distributed extensibility mechanisms doesn't mean that it doesn't have any extensibility. It has a bunch of extesibility mechanisms, maybe you should just use those.

Use Case: Embed XML

Use the HTML5 extensibility mechanisms

  1<div class="data">
  2  <h1 class="title">Your XML</h1>
  3  <div class="gpx">
  4    <div class="wpt"
  5         data-lat="50.077484" data-lon="14.443800">
  6      <span class="ele">200.08</span>
  7      <span class="time">2007-01-06T17:33:04Z</span>
  8      <span class="name">001</span>
  9      <span class="sym">Restaurant</span>
 10    </div>
 11  </div>
Foil #23

One of the things that pushed HTML and XML apart is that XML's error handling is arguably inappropriate in a web context. We could try to fix those things.

Use Case: a more forgiving XML

How can XML be made easier to use?

  • Guaranteeing that you'll get well-formed XML out of naive attempts to generate it with “print” statements is tricky.

  • Rules could be devised for providing some degree of markup minimization/error correction in XML.

  • It's possible to consider other simplifications as well, for namespaces, for example.

Foil #24

The Task Force talked about XForms but failed to craft a use case on which we could all agree.

Use Case: XForms?

  • Is XForms a use case or a specific solution to the use case of better form controls?

  • Is XFroms different in some substantial way than the general “embedding XML in HTML” use case?

Foil #25


What you can do

  • Review the Task Force Report

  • Talk to the communities you know about the use cases they have.

  • Report use cases that you think are not met.

Foil #26

If you're as depressed as I am, remember that the future is longer than the past. Just because things are bad today doesn't mean they can't be made better in the future.

Future history?

Foil #27

Thank you!

Foil #28

Hope that was useful.


I'm not sure what your arrows mean. I would have expected the line between HTML and HTML5 to be solid rather than dotted, because HTML5 is designed to be (almost) fully backward-compatible with HTML.

Posted by Secret Squirrel on 27 Mar 2011 @ 10:40pm UTC #

Yeah, that's fair. I changed that back and forth several times while I was working on the graphics. In the end, I think I chose a dashed line because it feels like the (re)specification of the parsing algorithm makes it a less direct descendant.

But maybe it should have been solid.

Posted by Norman Walsh on 28 Mar 2011 @ 11:07am UTC #
Comments on this essay are closed. Thank you, spammers.