<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       xml:lang="en"
       version='lillet'>
<info>
<title>Supporting Microformats</title>
<volumenum>8</volumenum>
<issuenum>116</issuenum>
<pubdate>2005-09-05T14:37:24-04:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2005</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Microformats, a technique for embedding machine readable data
in human readable formats, are growing in popularity. I've added support
for the hCalendar microformat in travel itineraries, but I'm not
optimistic about the technique.</para>
</abstract>
</info>

<para xml:id='p1'><personname><firstname>Dan</firstname>
<surname>Connolly</surname></personname> and I
<link xlink:href="/2005/itinerary/09-18-tagxsl">are travelling</link>
to <link xlink:href="http://en.wikipedia.org/wiki/Edinburgh">Edinburgh</link>
later this month for a
<link xlink:href="http://www.w3.org/2001/tag/">TAG</link> meeting.
In the course of looking at our respective online schedules, we got to
chatting about
<link xlink:href="http://en.wikipedia.org/wiki/Microformats">microformats</link>,
specifically
<link xlink:href="http://microformats.org/wiki/hcalendar">hCalendar</link>.
Dan's been experimenting with using it and I've obviously got
calendar data on the web, so you'd think I could use it too.</para>

<para xml:id='p2'>One of the reasons this blog exists is so that I
have a place to experiment, so I spent a few hours one evening last
week tinkering to get hCalendar supported on <link
xlink:href="/2005/itinerary/">my itineraries</link> pages. It turned
out to be a little tricky because that page doesn't have all the
detailed information needed to generate the event data, but I managed
to work around that. Right now, only a few of the events are actually
formatted with hCalendar, but over time I'll probably get all of them
into that format.</para>

<para xml:id='p3'>Microformats are becoming quite popular. Old timers
like myself recognize that these are what we used to call
“architectural forms” being reinvented. Exactly what constitutes a
microformat is probably open to debate. On one end of the scale there
are really simple things, like adding a <tag
class="attribute">rel</tag> attribute to anchor tags, and on the
other, considerably more complex things like
<link xlink:href="http://microformats.org/wiki/hcalendar">hCalendar</link>
and 
<link xlink:href="http://microformats.org/wiki/hcard">hCard</link> which
have nested structure.</para>

<para xml:id='p4'>The idea develops pretty naturally. You start with
some markup vocabulary (DocBook, XHTML, whatever you have lying
around) that has an attribute that's used for specialization. In
DocBook, we call it <tag class="attribute">role</tag>. In XHTML, it's
called <tag class="attribute">class</tag>. You use it when you want to
distinguish two pieces of data that are marked up with the same
element.</para>

<para xml:id='p5'>This works perfectly well on an ad hoc basis, and if
you pass the document to someone who isn't familiar with your
extensions, the fallback is natural and obvious.</para>

<para xml:id='p6'>Microformats (and architectural forms, and all the
other names under which this technique has been invented) take this
one step further by standardizing some of these attribute values and
possibly even some combination of element types and attribute values
in one or more content models.</para>

<para xml:id='p7'>This technique has some stellar advantages: it's
relatively easy to explain and the fallback is natural and obvious,
new code can be written to use this “extra” information without any
change being required to existing applications, they just ignore
it.</para>

<para xml:id='p8'>Despite how compelling those advantages are, there
are some pretty serious drawbacks associated with microformats as
well. Adding hCalendar support to my itineraries page reinforced
several of them.</para>

<orderedlist>
<listitem>
<para xml:id='p9'>They're not very flexible. While I was able to add
hCalendar to the overall itinerary page, I can't add it to the
individual pages because they don't use the right markup. I'm not
using <tag>div</tag> and <tag>span</tag> to markup the individual
appointments, so I can't add hCalendar to them.</para>
</listitem>

<listitem>
<para xml:id='p10'>I don't think they'll scale very well.
Microformats rely on the existing extensibility point, the
<tag class="attribute">role</tag> or
<tag class="attribute">class</tag> attribute. As such, they consume
that extensibility point, leaving me without one for any other use
I may have.</para>
</listitem>

<listitem>
<para xml:id='p11'>They're devilishly hard to validate.
<link xlink:href="http://en.wikipedia.org/wiki/Document_Type_Definition">DTDs</link>
and
<link xlink:href="http://en.wikipedia.org/wiki/XML_Schema">W3C XML Schema</link>
are right out the door for validating
microformats. Of course,
<link xlink:href="http://en.wikipedia.org/wiki/Schematron">Schematron</link>
(and other rule-based validation languages) can do it, but most of us
are used to using grammar-based validation on a daily basis and we're
likely to forget the extra step of running Schematron
validation.</para>

<para xml:id='p12'>It's interesting that
<link xlink:href="http://en.wikipedia.org/wiki/RELAX_NG">RELAX NG</link>
can almost, but not quite, do it. RELAX NG has no difficulty
distinguishing between two patterns based on an attribute value, but
you can't use those two patterns in an interleave pattern. So the
general case, where you want to say that the content of one of these
special elements is “an <tag>abbr</tag> with
<literal>class="dtstart"</literal> interleaved with an <tag>abbr</tag>
with <literal>class="dtend"</literal> interleaved with…”, you're out
of luck. If you can limit the content to something that doesn't
require interleaving, you can use RELAX NG for your particular
application, but most of the microformats I've seen use interleaving
in the general case.</para>

<para xml:id='p13'>Is validation really important? Well, I have well
over a decade of experience with markup languages at this point and I
was reminded just last week that I can't be relied upon to write a
simple HTML document without markup errors if I don't validate it. If
they can't be validated, they will often be incorrect.</para>
</listitem>
</orderedlist>

<para xml:id='p14'>At the end of the day, I'm not a fan of
microformats, at least not on the complex end of the spectrum. There
are undoubtedly lots and lots of situations where they're the only
practical answer, but if you don't <emphasis>have</emphasis> to use
them, I'm not sure you should.</para>

<para xml:id='p15'>If you want to embed data in your documents, embed
data. The XML source for the individual itinerary pages, for example,
doesn't use DocBook littered with <tag class="attribute">role</tag>
attributes to store itinerary information, it uses markup suited to
that purpose:</para>

<programlisting><![CDATA[<trip xmlns="http://nwalsh.com/rdf/itinerary#"
      startDate="2005-09-18T12:45:00" endDate="2005-09-30T23:59:59"
      trip="09-18-tagxsl">
   <itinerary>
      <leg class="flight">
         <startDate>2005-09-18T12:45:00</startDate>
         <endDate>2005-09-18T14:30:00</endDate>
         <description>BDL-RDU/AA 4695</description>
         <depart>#BDL</depart>
         <arrive>#RDU</arrive>
         <flight>4695</flight>
         <airline>American Airlines</airline>
      </leg>
      …]]></programlisting>

<para xml:id='p16'>I think that's a better answer when it's a
practical answer.</para>

</essay>


