Not a political tax, the angle bracket tax.
I've spent a couple of days trying to decide if I want to respond to Jeff Atwood’s swipe at XML. There's a fairly substantial part of my brain that says “just leave it alone”. But I guess the fact that you're reading this proves I didn't listen.
Jeff's swipe is motivated by a couple of examples, so let's start with them. First off, there's SOAP. There's no question that SOAP is noisy. It's not hard to see why: it was designed by a fairly large committee, and it was designed to solve a pretty big, complex problem. (Maybe a big complex problem that next to no one actually has; I'm not a big fan of the WS-* stack, but that's a different issue.)
No one holds up the Yugo or the Edsel as marvels of modern automobile engineering, but by the same token, few people suggest that cars are a bad idea just because some cars are badly designed.
Next up, Jeff tries to show how much better RFC 822 is for email. There's no question that it's more compact; I could learn to author email in XML, but I'm not anxious to do it. On the other hand, it's pretty obvious that XML is actually better.
Jeff summarizes with a perfectly reasonable statement:
I don't necessarily think XML sucks, but the mindless, blanket application of XML as a dessert topping and a floor wax certainly does. Like all tools, it's a question of how you use it.
I can't really disagree with that. XML may be my hammer of choice, but I don't hang picture hooks with a sledge hammer.
If you're data is really simple, maybe just a set of key/value pairs, and if both the key and the value are strings, and if the consequences of bad data are negligible, and if there's no possibility that there will ever be any additional complexity, then sure, maybe a flat text file is all you need.
On the other hand, the difference between:
1fruit=pear 2vegetable=carrot 3topping=wax
1<doc> 2<fruit>pear</fruit> 3<vegetable>carrot</vegetable> 4<topping>wax</topping> 5</doc>
isn't really that large, is it? (Or maybe you think it is, de gustibus non est disputandum.) Except, of course, that in the XML case, you don't have to write or maintain the code for the parser, unit tests for the parser, or documentation for the parser in every language (programming and documentation), and for every platform, supported by your application. Nor do you have to worry about how to parse the file when the data contains spaces or new lines or Chinese characters. And some day, when the data is just a tiny bit more complex, you won't have to devise some clever hack for extending the format. You'll just use XML.
Let's consider another example: RELAX NG has both an XML syntax and a compact (non-XML) syntax. It's possible to author in both of them, and you can translate from one to the other without any loss of data (and with minimal loss of formatting).
The consequence? Honestly? I author mostly in the compact syntax. Nevertheless, I absolutely rely on the XML syntax because having the XML syntax makes the entire schema amenable to processing with an enormous range of XML tools. General purpose tools that work equally well with RELAX NG and other XML languages. Tools that I did not have to write, test, debug, or document.
The lesson, if there's a lesson, is that even if you think a non-XML syntax is better for one purpose or another, the ability to translate into (and back out of) an XML syntax is a good thing. Of course, devising two syntaxes, and making them isomorphic, and making it possible to translate back and forth without destroying one format or the other, is a huge amount of work. It's usually easier to just use XML.
Jeff points out:
You could do worse than XML. It's a reasonable choice, and if you're going to use XML, then at least learn to use it correctly.
No argument from me there. Jeff follows that with a few questions, so I'll ask a few of my own.
Is there really a better default choice than XML?
Are you so confident that your intended use is never going to require any additional complexity that you're willing to bet against XML? Are you sure you'll never want any sort of validation or internationalization support?
Wouldn't it be nice to have easily readable, understandable data and configuration files, without inflicting yet another random, ad hoc syntax on your ever-lovin' mind?
I don't necessarily think all the alternatives to XML suck, but the mindless, knee-jerk rejection of XML because it contains a small amount of additional syntax certainly does. Like all tools, it's a question of how you use it. Please think twice before subjecting yourself, your fellow programmers, and your users to more fragile, ASCII-only, ad hoc syntaxes.