Not a political tax, the angle bracket tax.

I've spent a couple of days trying to decide if I want to respond to Jeff Atwood’s swipe at XML. There's a fairly substantial part of my brain that says “just leave it alone”. But I guess the fact that you're reading this proves I didn't listen.

Jeff's swipe is motivated by a couple of examples, so let's start with them. First off, there's SOAP. There's no question that SOAP is noisy. It's not hard to see why: it was designed by a fairly large committee, and it was designed to solve a pretty big, complex problem. (Maybe a big complex problem that next to no one actually has; I'm not a big fan of the WS-* stack, but that's a different issue.)

No one holds up the Yugo or the Edsel as marvels of modern automobile engineering, but by the same token, few people suggest that cars are a bad idea just because some cars are badly designed.

Next up, Jeff tries to show how much better RFC 822 is for email. There's no question that it's more compact; I could learn to author email in XML, but I'm not anxious to do it. On the other hand, it's pretty obvious that XML is actually better.

Jeff summarizes with a perfectly reasonable statement:

I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it.

I can't really disagree with that. XML may be my hammer of choice, but I don't hang picture hooks with a sledge hammer.

If you're data is really simple, maybe just a set of key/value pairs, and if both the key and the value are strings, and if the consequences of bad data are negligible, and if there's no possibility that there will ever be any additional complexity, then sure, maybe a flat text file is all you need.

On the other hand, the difference between:

  1fruit=pear
  2vegetable=carrot
  3topping=wax

and

  1<doc>
  2<fruit>pear</fruit>
  3<vegetable>carrot</vegetable>
  4<topping>wax</topping>
  5</doc>

isn't really that large, is it? (Or maybe you think it is, de gustibus non est disputandum.) Except, of course, that in the XML case, you don't have to write or maintain the code for the parser, unit tests for the parser, or documentation for the parser in every language (programming and documentation), and for every platform, supported by your application. Nor do you have to worry about how to parse the file when the data contains spaces or new lines or Chinese characters. And some day, when the data is just a tiny bit more complex, you won't have to devise some clever hack for extending the format. You'll just use XML.

Let's consider another example: RELAX NG has both an XML syntax and a compact (non-XML) syntax. It's possible to author in both of them, and you can translate from one to the other without any loss of data (and with minimal loss of formatting).

The consequence? Honestly? I author mostly in the compact syntax. Nevertheless, I absolutely rely on the XML syntax because having the XML syntax makes the entire schema amenable to processing with an enormous range of XML tools. General purpose tools that work equally well with RELAX NG and other XML languages. Tools that I did not have to write, test, debug, or document.

The lesson, if there's a lesson, is that even if you think a non-XML syntax is better for one purpose or another, the ability to translate into (and back out of) an XML syntax is a good thing. Of course, devising two syntaxes, and making them isomorphic, and making it possible to translate back and forth without destroying one format or the other, is a huge amount of work. It's usually easier to just use XML.

Jeff points out:

You could do worse than XML. It's a reasonable choice, and if you're going to use XML, then at least learn to use it correctly.

No argument from me there. Jeff follows that with a few questions, so I'll ask a few of my own.

  1. Is there really a better default choice than XML?

  2. Are you so confident that your intended use is never going to require any additional complexity that you're willing to bet against XML? Are you sure you'll never want any sort of validation or internationalization support?

  3. Do any of the XML alternatives actually have sufficient traction? (Maybe the answer to this question is yes. If JavaScript[L] is your only platform of interest, for example, then JSON may be a reasonable choice for some data, security issues notwithstanding.)

  4. Wouldn't it be nice to have easily readable, understandable data and configuration files, without inflicting yet another random, ad hoc syntax on your ever-lovin' mind?

I don't necessarily think all the alternatives to XML suck, but the mindless, knee-jerk rejection of XML because it contains a small amount of additional syntax certainly does. Like all tools, it's a question of how you use it. Please think twice before subjecting yourself, your fellow programmers, and your users to more fragile, ASCII-only, ad hoc syntaxes.

Comments:

I think James Clark got this one right: "Any damn fool could produce a better data format than XML." The value of XML lies in that everyone agrees on it, and in what the tools do for you, not in its technical excellence or beauty.

Posted by Lars Marius Garshol on 13 May 2008 @ 09:02pm UTC #

Another common complaint is that XML is hard to read or time consuming to author.

I wonder if this criticism would go away if there were better default tools for dealing with XML. Currently, most people - whether they are beginners or advanced - use text editors to deal with XML.

One of the reasons this is difficult is that you typically need different things from your editor for different dialects, for example whitespace may be important in one but not another.

Anyway, I still typically use a text editor to write XML, although I may choose to view it in something else, so all this is pure speculation.

Posted by Adrian Mouat on 14 May 2008 @ 09:07am UTC #

Thanks Norm for the excellent rebuttal.

The only thing I would add is that Atwood's comparison with RFC-822 was particularly unfortunate, obviously he has never seen any of the hilariously complex regexes needed to find dates or email addresses. XML is a clear win here IMHO.

Posted by Alastair Rankine on 14 May 2008 @ 01:14pm UTC #

Most complaints I am aware of are in the tools and in the impedance mismatches.

1. Try shoehorning even a moderately complex XML schema into a relational database or vice versa if one wasn't designed for the other.

2. Tools intended to ease the development like the use of annotations in SQL Server don't unless item one was considered. It's just ugly.

3. Simple ideas like VRML97 scene graphs created ugly XML schemas because of the object-has-an-object attributes-can't-have-elements mismatch.

In short, mapping into and out of the tree is doable but painful and the shortcomings are often attributed to XML instead of the application languages that were shoehorned into XML Schema or XML itself. But we've been shoehorning since the beginning of the HTML-endowed Internet. A lot is learned that way but some devolution occurs. No size fits all comfortably for some.

Posted by len on 14 May 2008 @ 01:45pm UTC #
Norm:

I would emphasize a couple more points:

-extensibility

-relative access

How often can you freely extend a data structure without generally breaking existing consumers of this data structure?

How often do you get a relative access to a particular element of information without the need to understand the entire data structure?

When you combine that with all the benefits you listed, I see a compelling reason to deal with minor syntax or tooling annoyances.

P.S.: it is really annoying to mark up a comment to make it look ok...

Posted by Jean-Jacques Dubray on 14 May 2008 @ 01:56pm UTC #

Hi Norm,

My comment is completely tangential/irrelvant to this particular post as a whole, but as far as the " On the other hand, it's pretty obvious that XML _is actually better_." hyperlink:

I personally think it might be really nice if that particular link didn't send me to a page that had a big blank space with a note saying "If you had Flash installed you'd see a nice graph here".

Hoping you might be able to do something to change that.

--Mike

Posted by Michael(tm) Smith on 14 May 2008 @ 05:54pm UTC #

The difference is large enough to piss me off.

Imagine you're using XML for a properties (key/value) file, say for configuration.

You wouldn't write unit tests for that? Interesting. Personally, I'd trust a XML tool-chain about as far as I can spit.

Rusty Harold Elliot, some time back, gave the example of borking apache's configuration file. Whitespace in the wrong place or something. That unfortunate design choice was a justification for using XML.

The correct answer, then, now, in the future, is to write your own grammar. For a typical configuration properties files, it'd be pretty straightforward.

To anyone who believes such an exercise is too hard: Yet you're willing to write an XSD, DTD, or RELAX-NG? Please. (Sidebar: My VRML97 grammar implementation was smaller, quicker, and more correct than any XML-based "solution".)

In review, for the back of the class, please pay attention:

If your answer is XML, you asked the wrong question.

Cheers, Jason Osgood / Seattle WA

Posted by zappini on 14 May 2008 @ 11:09pm UTC #

Norm,

Thanks for the thoughts about the values of XML.

I work in the financial industry. I have seen XML used several times resulting in global applications that easily exchange data between multiple countries or multiple financial institutions. The few extra characters required by XML compared to a flat text file are nothing compared to the advantages of using XML. The advantages include not having to argue about which format to use, like in the old days before XML, and the availability of validation. The advantages of using XML include each party being able to use the tools they prefer to manipulate the data, not like in the old days when use of proprietary formats resulted in all parties being forced to use the only available tools for processing that format. Yet another advantage of XML is not having to worry about parsing files with extra blanks or using characters from languages that you never expected to see, like in the old days.

Posted by James Orenchak on 15 May 2008 @ 07:35pm UTC #

Rolling your own is for those with no imagination or foresight. Some people cannot imagine that data or information has any value other than in the particular context of a single task. XML is all about capturing data and information in ways that perserve value for unknown uses. Not only is XML extensible, but XSLT allows reformating from a particular vocabulary to another. That this is always perfect, but there are rules and tools and organizations to support it worldwide. IT should not be about making the programmer's task easier at any point (as many seem to think), but about giving the most utility to the data or information now and in the future.

Posted by Mark on 16 May 2008 @ 12:18pm UTC #

Jason,

XML obviously doesn't work for you. However, consider the following - VRML does have an XML equivalent called X3D which is gaining adherents precisely because it can be validated without writing your own grammar, because it can be generated from external content in other contexts, and because it is possible, once loaded, to modify the X3D content in a consistent manner without requiring a specialized API.

Writing a grammar is not hard. Writing a grammar + parser + documentation + distributing it to everyone who uses it is much harder. I've worked with configuration and INI files on many different operating systems over a thirty year span.

I've discovered what happens if you edit many of these text files in a text editor without a way of validating - you can quite readily end up doing things as diverse as royally screwing up your operating system, make it impossible for your video driver to actually render content to the screen, cause mechanical malfunctions and so forth, often times by doing something as simple as forgetting a tab or using the wrong number of spaces.

I've also spent hundreds of hours over the years trying to figure out cryptic man files that existed as the only documentation for a given command trying to figure what exactly the grammar being used actually was.

You're free to use whatever format you so choose - but I'd rather take the slight verbosity of XML over a poorly written and deficient configuration grammar anytime.

Posted by Kurt Cagle on 18 May 2008 @ 07:00pm UTC #

I'm just wondering what the argument is in favour of XML vs YAML?

I switched to YAML a few years ago and it got of a ridiculous number of XML caused headaches.

Posted by engtech on 22 May 2008 @ 11:17pm UTC #

1. Lisp is a better answer than XML. Lisp allows you to store data in exactly the same manner as XML. You still get the parsing for free, but you can also manipulate it to your heart's content.

2. Yes. Unequivocally.

3. Probably not :P

4. One syntax for data AND programming seems pretty easy to me.

Posted by Jake Voytko on 23 May 2008 @ 03:58pm UTC #

Re. Lars Marius' quote from James Clarke "Any damn fool could produce a better data format than XML", I really ought to know better than to second-guess James's intention, but I read that as "If we'd wanted to invent a data format, we wouldn't have invented XML".

I read Jeff Atwood's article too, and it's just the same old, same old. I'm not a programmer, I'm a publisher, and for me XML is nothing short of revolutionary, because of what it has enabled: the separation of form from content; self-describing documents; human readability; easy extensibility.

Can someone please explain to computer programmers that XML was not created to make their lives easier, and that it is a text format, not a programming language (at least in its narrower sense). So to Jason and his pals, I say stop whining: it was tedious five years ago, when there was still a debate. "My VRML97 grammar implementation was smaller, quicker, and more correct than any XML-based "solution"". Well, I bow before your godlike genius, but don't ever ask me for a job: I like my staff to live within commuting distance of Planet Earth.

If you want to compare XML with something, a comparison with RTF might be instructive:

Try loading a single RTF file into the last 3 major releases of Microsoft Word (let alone WordPerfect) and see what you get.
Try writing an RTF document in NotePad.
Try re-purposing an RTF document
Try stripping the formatting from an RTF document.

Then there is the document quality angle. I've written and distributed dozens of Word templates over the years, some of which have done some quite fancy stuff. When I got the documents back, authors had typically ignored the toolbar button I had created for applying a Heading 1 style with a single click, and instead labouriously manually formatted and aligned a paragraph of "Normal" text, because they "didn't like the way the other heading looked". So I end up with a 50 page document, every paragraph of which is styled Normal, which I'm supposed to auto-convert to a DTP file.

XML prevents that sort of abuse, enforces regular document structure, and embeds meaningful metadata/semantics into documents in a largely unobtrusive way. In doing so, it moved the re-use and re-purposing of content from fantasy island to reality city. That's a galactically big achievement in my book.

Posted by John Hanratty on 23 May 2008 @ 04:40pm UTC #

xml is designed to be (unequivocally) machine readable, and it does a pretty good job at that. Maybe some of of it's critics are young or have short memories but I can certainly remember spending way too long going nearly crazy trying to decode idiosyncratic data files, and, also, trying to invent data formats that wouldn't do the same to others.

If xml isn't easy to follow, maybe we need some better viewers and editors, both generic and purpose-built. I don't know if there's a spec for this already, but you could potentially have a "xlv" viewer spec - like the xls content spec - that specifies good human views "hints" for an xml data scheme: styles, shapes, key elements, and so on. I can't see why anyone should really be editing xml tags - well, sometimes you've got to work at the low level - there's really got to be better ways to display and interact with data. But the xml can and should remain the definitive statement of the data.

If the tree data model doesn't really work with you data, you have to either get off xml and loose it's universality and robustness, or, transform the data into it's natural shape before you even look at it. If you're working in a dumb text editor you're up against it.

Posted by Jim Birch on 02 Jun 2008 @ 02:29am UTC #

Why no mention of binary XML? Doesn't that get rid of the angle tax?

Posted by Rob on 11 Jul 2008 @ 06:39pm UTC #
Comments on this essay are closed. Thank you, spammers.