<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="pto" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
    
    
    
    
<title>Is RDF/XML Good for Anything?</title><biblioid class="uri">http://norman.walsh.name/2004/07/30/rdfxml</biblioid>
<volumenum>7</volumenum>
<issuenum>136</issuenum>
<pubdate>2004-07-30T08:27:55-04:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2004</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>Having a standard transfer syntax for RDF is great.
XML is an ideal format for this sort of “core dump”: it’s amenable to
machine processing and it’s possible for a human being (with
sufficient skill, experience, and dedication) to look at it in a text
editor and “figure it out”.
So RDF/XML is good for RDF core dumps.
But is it something users should be writing by hand? I’m not sure.
</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#RDF"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<para xml:id="p1">The title, I admit, is inflammatory, but I’m serious about the question.
I’ve been thinking about it ever since I read
<personname>
      <firstname>Dorothea</firstname>
      <surname>Salo</surname>
    </personname>’s
double-barrelled
<link xlink:href="http://cavlec.yarinareth.net/archives/2004/07/21/look-we-get-it-already/">response</link> to 
<personname>
      <firstname>Stefano</firstname>
      <surname>Mazzocchi</surname>
    </personname>’s
<link xlink:href="http://www.betaversion.org/~stefano/linotype/news/57/">guide</link>
to semantic web specs.</para>

<para xml:id="p2">Dorothea’s rant is both fun to read and absolutely right on.</para>

<para xml:id="p3">Chances are, you already know
this, but I’ll say it again anyway:
XML is designed to describe trees. Nested markup does this with
reasonable efficiency in exactly one way<footnote>
      <para xml:id="p4">“Exactly” is an
over-statement; there’s some variation because attributes are unordered
and there’s a small amount of variability in the syntax, but there’s
<emphasis>essentially</emphasis> one XML document that represents any
given tree.</para>
    </footnote>.
RDF is a collection of (subject, predicate, value) triples that generally
speaking form a graph.
RDF/XML is a transfer syntax for graphs. There are lots of ways to “flatten”
a graph into a tree. There will always be significant variation in the possible
RDF/XML serializations of an RDF graph.
</para>

<para xml:id="p5">Having a standard transfer syntax is great. The fact, for
example, that the PSVI <emphasis>doesn’t</emphasis> have one, is a
common source of irritation. XML is a great transfer syntax for RDF.
XML is an ideal format for this sort of “core dump”: it’s amenable to
machine processing and it’s possible for a human being (with
sufficient skill, experience, and dedication) to look at it in a text
editor and “figure it out”.</para>

<para xml:id="p6">So RDF/XML is good for RDF core dumps.</para>

<para xml:id="p7">But is it something users should be writing by hand? I’m not sure.
And with impeccable timing, <personname>
      <firstname>Edd</firstname>
<surname>Dumbill</surname>
    </personname> enters the scene at this point
and announces <link xlink:href="http://usefulinc.com/doap/">DOAP</link>.</para>

<para xml:id="p8">DOAP is an RDF vocabulary for describing metadata about projects. 
I have lots of projects, maintaining the metadata about them (web pages,
syndication feeds, freshmeat announcements, CVS tags, email announcements,
etc.) is tedious and error-prone. Having a standard way to represent this
data is a <emphasis>fabulous</emphasis> idea.</para>

<para xml:id="p9">So how should we store this metadata?</para>

<variablelist>
<varlistentry>
      <term>In RDF/XML</term>
<listitem>
<para xml:id="p10">One way would be to store the data directly in RDF/XML or some other
RDF transfer syntax. That’s (too) flexible and hard to validate. Besides, I’m
already suspicious that RDF/XML is for core dumps.</para>
</listitem>
</varlistentry>

<varlistentry>
      <term>In a DOAP XML Format</term>
<listitem>
<para xml:id="p11">Edd has taken a stab at making DOAP more palatable to the XML crowd
by providing a
<link xlink:href="http://www-106.ibm.com/developerworks/xml/library/x-osproj4/">RELAX NG
grammar</link> for DOAP files. That’s cool. Now he’s
got an XML format that just happens to be isomorphic to one possible RDF/XML
serialization. Does that really count? Yes, I think it does. How can I argue
that it doesn’t? It’s an XML format that I can edit with my normal RELAX NG-aware
editors.</para>
<para xml:id="p12">But I’m leaning towards making “announcement” essays for my projects,
so having this information in a separate file doesn’t seem right. Duplication
of information is bad.</para>
</listitem>
</varlistentry>

<varlistentry>
      <term>As Metadata in an Essay</term>
<listitem>
<para xml:id="p13">My next idea was to make the DOAP vocabulary a metadata vocabulary
to put in the essay’s <tag>info</tag> element, just like I currently
allow Dublin Core terms in there.</para>
<para xml:id="p14">I implemented that, and it worked, but it didn’t really solve the
duplication of information problem.</para>
</listitem>
</varlistentry>

<varlistentry>
      <term>As <emphasis>Data</emphasis> in an Essay</term>
<listitem>
<para xml:id="p15">At this point, I realized that I was going about this backwards.
An RDF-primary focus isn’t a very XML approach to the problem. What I should
do is put the information <emphasis>in the body of the essay</emphasis>
with enough markup to identify it.</para>
<para xml:id="p16">I <link xlink:href="/2004/projects/sxpipe">implemented that too</link>.
I ended up with a bunch of <tag class="attribute">role</tag> attributes
on phrases and links to achieve it. It’s not the most attractive markup,
but I haven’t thought deeply yet about what the right markup is. (One
interesting exception is the <tag>doap:license</tag> element, which I left
in the <tag>info</tag>. I can’t think of an element that has the right semantics:
preserve the link, but don’t display the URI or make the link “hot”.)
</para>
<para xml:id="p17">One thing that did occur to me, and that I think will occur to a lot
of you, is putting the DOAP markup right in the essay. Instead of saying
“<code>&lt;phrase role="doap.name"&gt;name&lt;/phrase&gt;</code>”, say
“<code>&lt;doap:name&gt;name&lt;/doap:name&gt;</code>”. I see three problems with that
approach: first, it doesn’t work for all the DOAP structures because some
of them are nested; second, I’d have to define the prose processing expectations for
all those elements; and third, everyone looking at my XML would have to understand
all those elements, using standard elements with roles makes document interchange
easier.</para>
</listitem>
</varlistentry>
</variablelist>

<para xml:id="p18">I’m not done thinking about this issue yet, but this little case study
supports a direction that’s starting to feel right to me: RDF is a good
tool for aggregating and analyzing data, but it’s not the right tool for
creating or maintaining information. In a sense, (some of) the RDF community are
already leaning this way too, with proposals like
<link xlink:href="http://www.w3.org/TR/grddl/">GRDDL</link> being developed
to define standard ways for extracting RDF from data that’s richly marked
but not directly encoded in RDF/XML.</para>

<para xml:id="p19">But for the record, the fact that I have to embed RDF/XML
<emphasis>in comments</emphasis> in XHTML still sucks.</para>

</essay>

