<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       xmlns:foaf="http://xmlns.com/foaf/0.1/"
       xml:lang="en"
       version='5.0'>
<info>
<title>Defending the tax</title>
<volumenum>11</volumenum>
<issuenum>45</issuenum>
<pubdate>2008-05-13T08:31:03-04:00</pubdate>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2008</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Not a political tax, the angle bracket tax.</para>
</abstract>
</info>

<para xml:id='p1'>I've spent a couple of days trying to decide if I want
to respond to <personname><firstname>Jeff</firstname>
<surname>Atwood</surname></personname>’s
<link xlink:href="http://www.codinghorror.com/blog/archives/001114.html">swipe</link>
at <wikipedia>XML</wikipedia>.
There's a fairly substantial part of my brain that says “just leave
it alone”. But I guess the fact that you're reading this proves I didn't
listen.</para>

<para xml:id='p2'>Jeff's swipe is motivated by a couple of examples, so let's start with
them. First off, there's SOAP. There's no question that SOAP is noisy. It's
not hard to see why: it was designed by a fairly large committee, and it
was designed to solve a pretty big, complex problem. (Maybe a big complex
problem that next to no one actually <emphasis>has</emphasis>; I'm not
a big fan of the WS-* stack, but that's a different issue.)</para>

<para xml:id='p3'>No one holds up the <wikipedia>Yugo</wikipedia> or the
<wikipedia>Edsel</wikipedia> as marvels of modern automobile engineering,
but by the same token, few people suggest that cars are a bad idea
<emphasis>just because</emphasis>
some cars are badly designed.</para>

<para xml:id='p4'>Next up, Jeff tries to show how much better RFC 822 is for email.
There's no question that it's more compact; I could learn to author email
in XML, but I'm not anxious to do it. On the other hand, it's pretty obvious
that XML
<link xlink:href="http://www.markmail.org/">is actually better</link>.
</para>

<para xml:id='p5'>Jeff summarizes with a perfectly reasonable statement:</para>

<blockquote>
<para xml:id='p6'>I don't necessarily think XML sucks, but the mindless, blanket
application of XML as a dessert topping and a floor wax certainly
does. Like all tools, it's a question of how you use it.</para>
</blockquote>

<para xml:id='p7'>I can't really disagree with that. XML may be my hammer of
choice, but I don't hang picture hooks with a sledge hammer.</para>

<para xml:id='p8'>If you're data is <emphasis>really</emphasis> simple, maybe just
a set of key/value pairs, and if both the key and the value are strings,
and if the consequences of bad data are negligible, and if there's no
possibility that there will ever be any additional complexity, then sure,
maybe a flat text file is all you need.</para>

<para xml:id='p9'>On the other hand, the difference between:</para>

<programlisting>fruit=pear
vegetable=carrot
topping=wax</programlisting>

<para xml:id='p10'>and</para>

<programlisting><![CDATA[<doc>
<fruit>pear</fruit>
<vegetable>carrot</vegetable>
<topping>wax</topping>
</doc>]]></programlisting>

<para xml:id='p11'>isn't really that large, is it? (Or maybe you think it is,
<foreignphrase>de gustibus non est disputandum</foreignphrase>.)
Except, of course, that in the XML case, you don't have to write or
maintain the code for the parser, unit tests for the parser, or
documentation for the parser in every language (programming and documentation),
and for every platform,
supported by your application. Nor do you have to worry about how to
parse the file when the data contains spaces or new lines or Chinese
characters. And some day, when the data is just a tiny bit more
complex, you won't have to devise some clever hack for extending the
format. You'll just use XML.</para>

<para xml:id='p12'>Let's consider another example: RELAX NG has both an XML
syntax and a compact (non-XML) syntax. It's possible to author in both
of them, and you can translate from one to the other without any loss
of data (and with minimal loss of formatting).</para>

<para xml:id='p13'>The consequence? Honestly? I author mostly in the compact
syntax. Nevertheless, I absolutely rely on the XML syntax because
having the XML syntax makes the entire schema amenable to processing
with an enormous range of XML tools. General purpose tools that work
equally well with RELAX NG and other XML languages. Tools that I did not
have to write, test, debug, or document.</para>

<para xml:id='p14'>The lesson, if there's a lesson, is that even if you think a
non-XML syntax is better for one purpose or another, the ability to
translate into (and back out of) an XML syntax is a good thing. Of course,
devising two syntaxes, and making them isomorphic, and making it possible
to translate back and forth without destroying one format or the other,
is a huge amount of work. It's usually easier to just use XML.</para>

<para xml:id='p15'>Jeff points out:</para>

<blockquote>
<para xml:id='p16'>You could do worse than XML. It's a reasonable choice, and if
you're going to use XML, then at least learn to use it correctly.</para>
</blockquote>

<para xml:id='p17'>No argument from me there. Jeff follows that with a few questions,
so I'll ask a few of my own.</para>

<orderedlist>
<listitem>
<para xml:id='p18'>Is there <emphasis>really</emphasis> a better default choice than XML?</para>
</listitem>
<listitem>
<para xml:id='p19'>Are you so confident that your intended use is never going to require
any additional complexity that you're willing to bet against XML? Are you
sure you'll never want any sort of validation or internationalization support?</para>
</listitem>
<listitem>
<para xml:id='p20'>Do any of the XML alternatives actually have sufficient traction? (Maybe 
the answer to this question is yes. If <wikipedia>JavaScript</wikipedia> is your only platform of
interest, for example, then <wikipedia>JSON</wikipedia> may be a reasonable choice for some data,
<link xlink:href="http://www.ibm.com/developerworks/xml/library/x-xml2008prevw.html#N10083">security issues</link> notwithstanding.)</para>
</listitem>
<listitem>
<para xml:id='p21'>Wouldn't it be nice to have easily readable, understandable data
and configuration files, without inflicting yet another random, ad hoc
syntax on your ever-lovin' mind?</para>
</listitem>
</orderedlist>

<para xml:id='p22'>I don't necessarily think all the alternatives to XML suck, but the
mindless, knee-jerk rejection of XML because it contains a small amount of
additional syntax certainly does. Like all tools, it's a question of how you use it.
Please think twice before subjecting yourself, your fellow
programmers, and your users to more fragile, ASCII-only,
ad hoc syntaxes.</para>

</essay>
