<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
<title>Inventing XML Languages</title><biblioid class="uri">http://norman.walsh.name/2006/01/17/xmlLanguages</biblioid>
<volumenum>9</volumenum>
<issuenum>9</issuenum>
<pubdate>2006-01-17T10:17:07-05:00</pubdate>
<date>$Date: 2006-01-17 11:22:07 -0500 (Tue, 17 Jan 2006) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2006</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>My two cents on the controversy Tim recently stirred up on XML
language creation.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Microformats"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<epigraph>
<attribution>
      <personname>
	<firstname>T. E.</firstname>
<surname>Hulme</surname>
      </personname>
    </attribution>
<para xml:id="p2">Language is by its very nature a
communal thing; that is, it expresses never the exact thing but a
compromise—that which is common to you, me, and everybody.</para>
</epigraph>

<para xml:id="p1"><foaf:name>Tim Bray</foaf:name>’s 
<citetitle xlink:href="http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages">Don't Invent XML Languages</citetitle> 
(and its companion essay,
<citetitle xlink:href="http://www.tbray.org/ongoing/When/200x/2006/01/09/On-XML-Language-Design">On XML Language Design</citetitle>)
reflect mostly the content of his presentation
at <link xlink:href="http://2005.xmlconference.org/">XML 2005</link>, so
there wasn't anything in them
that surprised me. And basically, I'm inclined to agree with Tim.</para>

<para xml:id="p3">However, in the past week or so I've read several essay's
critical of Tim's position (from bright folks like
<link xlink:href="http://copia.ogbuji.net/blog/2006-01-12/Learn_how_">
      <foaf:name>Uche Ogbuji</foaf:name>
    </link>, 
<link xlink:href="http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=35892ba2-c37c-476a-83b5-e721fc1a3b36">
      <foaf:name>Dare Obasanjo</foaf:name>
    </link>,
<link xlink:href="http://dannyayers.com/archives/2006/01/10/new-data-languages-harmful/">
      <foaf:name>Danny Ayers</foaf:name>
    </link>, and probably others).
As someone who, at one level or another, <emphasis>makes a living</emphasis>
inventing new XML languages, I wonder if I shouldn't be more critical too?</para>

<para xml:id="p4">Well, first off, Tim is pretty clearly talking about inventing
<emphasis>global standard</emphasis> XML languages, those are
the one's that are “boring, political, time-consuming, [and]
unglamorous” to create. I don't think he's talking about the XML
format you use for configuring your application or the one that your
application uses internally as a normal form for some data. Developing
those is mostly boring, usually non-political, pretty fast, and
unglamorous. But I don't think anyone's trying to dissuade you from
inventing them. If you have the choice between storing some
application data in XML or something else, I want you to use
XML.</para>

<para xml:id="p5">Tim goes on to argue for a bit about why it's a painful, risky,
and expensive to invent a new standard. Been there, done that. He's
right.</para>

<para xml:id="p6">Finally, there's a list of “the big five”: well established
standards that you'd be better off using than reinventing. DocBook
made the list, so on that level I'm a happy camper. (heh!)</para>

<para xml:id="p7">It struck me on reading this list that it was a pretty good
cross-section of the sorts of things likely to be the targets of
reinvention. (It helps, I'm sure, that Tim and I have similar sorts of
backgrounds). I can easily imagine that someone writing an authoring
tool might consider attempting to develop a new standard for
documentation or someone, writing a business package, a new purchase
order standard, or some social software startup trying to create a new
format for sending little bits of information around that will be
updated regularly. The big five cover those areas so use them instead.</para>

<para xml:id="p8">I think the odds of someone considering a reinvention of XSLT or
MathML or SVG or RDF or Topic Maps (or any of a wide range of other
specialized vocabularies) are a lot smaller if for no other reason,
simply because the market for those sorts of languages is a lot smaller.
But the same
lesson would apply: don't. If there's a standard out there that is
already widely accepted and fits your needs reasonably well, try
really hard to use it, taking advantage of its extension mechanisms if
you can, before you reinvent it.
“<link xlink:href="http://en.wikipedia.org/wiki/Not_invented_here">Not invented here</link>”
is <emphasis>never</emphasis> an acceptable reason to reinvent.</para>

<section xml:id="microformats">
<title>Microformats</title>

<para xml:id="p9">If I have a bone to pick with Tim, it's his
<link xlink:href="http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages#p-9">hearty endorsement</link> of
<link xlink:href="http://en.wikipedia.org/wiki/Microformats">microformats</link>,
as if putting the angle brackets around the names was the hard
part in language design. Designing a microformat is designing a language
and is subject to all the same pitfalls. It just happens to be that the
microformats developed so far have been relatively small and developed
by small groups of like-minded individuals.
Both of those conditions mitigate the problems of
language design.</para>

<para xml:id="p10">I don't want to come across as some sort of curmudgeon opposed to
microformats for opposition's sake, but I do
<link xlink:href="/2005/09/05/microformats">have some concerns</link>.
</para>
</section>

<section xml:id="mustIgnore">
<title>Must ignore</title>

<para xml:id="p11">One of the driving forces behind microformats, as I see it, is
the browsers' implementation of “must ignore” semantics on
unrecognized markup. This is generally praised as an almost
universally good thing, and I have no doubt that it is
<emphasis>often</emphasis> a good thing. It was an important
stepping stone in the development of the modern web browser.</para>

<para xml:id="p12">But let's also recognize that it has forced us into a culdesac.
It's frightfully difficult to embed new markup in HTML because
it just gets ignored. We have to shoe-horn our extensions into existing
markup.</para>

<para xml:id="p13">Consider the proposal for a
“<link xlink:href="http://microformats.org/wiki/geo">geo</link>” microformat
(not because I think there's anything wrong with it, just because
I happen to have it open in a browser tab at the moment).</para>

<para xml:id="p14">Suppose I wanted to identify the location of the
<link xlink:href="http://en.wikipedia.org/wiki/Eiffel_tower">Eiffel Tower</link>.
In an ideal world, I could say:</para>

<programlisting>&lt;p&gt;The Eiffel Tower is located at
&lt;geo xmlns="http://example.org/geo" lat="48.8589" long="2.2958"/&gt;.
&lt;/p&gt;</programlisting>

<para xml:id="p15">But this isn't an ideal world and any browser which didn't use
my stylesheet would display that like this:</para>

<blockquote>
<para xml:id="p16">The Eiffel Tower is located at</para>
</blockquote>

<para xml:id="p17">Not terribly useful. If instead the browser could be coerced
(through some form of “must understand” perhaps) to display the markup
it didn't recognize :</para>

<blockquote>
<para xml:id="p18">The Eiffel Tower is located at &lt;geo lat="49.8589" long="2.2958"/&gt;</para>
</blockquote>

<para xml:id="p19">it would be at least practical to use the new markup. But that's
not the case, so instead we have to resort to markup like this:</para>

<programlisting>&lt;p&gt;The Eiffel Tower is located at 
&lt;span class="geo"&gt;
 &lt;abbr class="latitude" title="48.8589"&gt;48° 51' 10" N&lt;/abbr&gt; 
 &lt;abbr class="longitude" title="2.2958"&gt;002° 20' 59" E&lt;/abbr&gt;
&lt;/span&gt;.&lt;/p&gt;</programlisting>

<para xml:id="p20">Which is hard to validate and prone to error. I stand by what
I said before, if you want to embed data in your documents, embed data.
Transforming to a microformat for presentation is a good thing, but
I can't recommend them for authoring.</para>

<para xml:id="p21">I can't recommend inventing new XML languages either, unless you
have to. And sometimes you have to.</para>
</section>
</essay>

