Rebuked

Volume 7, Issue 6; 13 Jan 2004; last modified 08 Oct 2010

Mark caught me serving broken XHTML. I wish my browser had done me that favor.

Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.

—Scott Adams

In a comment on my Postel’s Law post, Mark Pilgrim quite correctly chastises me for serving broken content. This, he goes on to point out in a second comment, makes me a bozo and an incompetent fool, at least by Tim Bray’s metric.

What happened was this: even though each essay is carefully checked for well-formedness and even validity, that checking doesn’t resolve the server-side includes. One of those includes I create by hand. Last time I edited it, I carelessly wrote it in HTML instead of XHTML and got an empty tag wrong. (Another include had some bogus namespace declarations in it, but that’s not a well-formedness problem.)

Now, Mark was able to read the essay despite my error because his browser ignored the error. Mark calls this lucky. I dunno. I think there are two ways of looking at it:

Browser good. The browser is applying heuristics to recover from XML well-formedness errors, thus allowing users to read the content. Reading the content is what’s important and the browser successfully renders it.
Browser bad. The browser is applying heuristics to recover from XML well-formedness errors, thus masking problems that only a bozo or an incompetent fool would be unable or unwilling to fix. This sets the expectation that applications should recover from XML well-formedness errors. That would be wrong, not least of all because it introduces the possibility of much subtler and more serious problems later on, as one application’s set of heuristics differ from another’s.

I infer that Mark subscribes the former view. I subscribe to the latter. And I’m quite willing to play by the rules. The fact that browsers render broken content is a bug. Had they rejected the content as they should, it’s unlikely that my carelessness would ever have been seen by the public. At the very least, that would have saved me some embarrassment.

Comments

You might also want to know that Internet Explorer does not support &apos;. Instead of ' it simply shows &apos;. Another reason not to send XHTML to browsers that only support HTML.

Escaping the single quotes (') does seem to have introduced more IE oddness. But I'm not sure I need to escape them really, so let's not.

Better? Yes, probably.

Actually, I subscribe to the view that browsers should go to virtually any length to display the information I asked them to display, but should inform me (subtlely) if the information is not well-formed/valid/whatever is appropriate. I talked about this a year and a half ago, and my position has not changed: http://diveintomark.org/archives/2002/08/20/how_liberal_is_too_liberal

If User-Agents filled your logs full of complaints, you would have been notified without disrupting the user's experience. If search engines refused to index your page, since the right action under Postel's law would be to refuse to republish broken content, you would have a motivation to fix it.

I've written on this here: http://www.franklinmint.fm/blog/archives/000092.html

Both the "browser good" and "browser bad" POV's are broken.

We've lived in a "browser good" world for a decade now, and a switch to strict interpretation and display of content will cause ludicrous amounts of needless pain. It's lamentable, but them's the breaks.

Mark's right that a "browser bad" is a horrid from the user perspective. From a developer's perspective, it's precisely what we need, because it leads to finding and fixing errors as early as possible. But very few of us are developers.

I'd argue what we really need is a "browser settable" world, where it defaults to "browser good" for legacy reasons, but lets the content producers and developers turn on a "browser bad" strict mode to find errors. Arguably, we're the only people that care about well-formedness in the first place...

Browser good. The web browser is not primarily a development tool, and validation services exist already for (X)HTML pages and XML feeds, if you're the sort that cares about such things.

If somebody wants to spend an afternoon educating my friends about why they shouldn't compose their blog entries in Microsoft Word, I'm all for it, but for my part, I've tried, and their eyes just glaze over whenever I attempt an explanation of character encodings on the web. Meanwhile, I'm grateful that I can still manage to read their terribly broken web pages and RSS feeds.

Mozilla based browsers will gladly validate well formedness. I do this on my weblog:

http://www.xml.com/pub/a/2003/03/19/dive-into-xml.html

In addition, I do a nightly scan for validity. This can turn up surprising combinations:

http://www.intertwingly.net/blog/1631.html

<quote>The browser is applying heuristics to recover from XML well-formedness errors,</quote>

that's not really what's happening: you are serving the files as text/html so much of the XML syntax (eg <meta .../>)is just HTML syntax errors and you are relying on the browser's lax reporting of HTML errors for any of your page to be read.

<quote>The fact that browsers render broken content is a bug. Had they rejected the content as they should,</quote>

The content is broken HTML if it is well formed XHTML, but HTML agents are allowed to be lenient. If you wish to play by XML rules, serve the content with an XML mime type, then you will get strict XML parsing rules in both IE and mozilla/netscape (although the display in IE might not be quite what you want unless you supply a stylesheet)

Basically by serving non-html as text/html you are _asking_ for highly tolerant and lax parsing.

I thought of that myself yesterday. It occurred to me that the W3C validator is actually ignoring the content type when it does XHTML checking. Since the data is served as text/html, it has no business reporting XML well-formedness errors.

OTOH, it probably makes sense for a validator to exhibit this behavior.

So I was wrong. Again. Oh, well. I stand by my conviction that recover from well formedness errors in XML content is wrong.

<quote>I stand by my conviction that recover from well formedness errors in XML content is wrong.</quote>

I agree. So, be brave: serve your pages as application/xml

(if you add xsl:version="1.0" xmlns:xsl="..." to the <html> element, the pages will not only be checked for well formedness, but actually display in IE and mozilla as well....)