<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       version='pto'>
<info>
<title>Is RDF/XML Good for Anything?</title>
<volumenum>7</volumenum>
<issuenum>136</issuenum>
<pubdate>2004-07-30T08:27:55-04:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2004</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Having a standard transfer syntax for RDF is great.
XML is an ideal format for this sort of “core dump”: it’s amenable to
machine processing and it’s possible for a human being (with
sufficient skill, experience, and dedication) to look at it in a text
editor and “figure it out”.
So RDF/XML is good for RDF core dumps.
But is it something users should be writing by hand? I’m not sure.
</para>
</abstract>
</info>

<para xml:id='p1'>The title, I admit, is inflammatory, but I’m serious about the question.
I’ve been thinking about it ever since I read
<personname><firstname>Dorothea</firstname><surname>Salo</surname></personname>’s
double-barrelled
<link xlink:href="http://cavlec.yarinareth.net/archives/2004/07/21/look-we-get-it-already/">response</link> to 
<personname><firstname>Stefano</firstname><surname>Mazzocchi</surname></personname>’s
<link xlink:href="http://www.betaversion.org/~stefano/linotype/news/57/">guide</link>
to semantic web specs.</para>

<para xml:id='p2'>Dorothea’s rant is both fun to read and absolutely right on.</para>

<para xml:id='p3'>Chances are, you already know
this, but I’ll say it again anyway:
XML is designed to describe trees. Nested markup does this with
reasonable efficiency in exactly one way<footnote><para xml:id='p4'>“Exactly” is an
over-statement; there’s some variation because attributes are unordered
and there’s a small amount of variability in the syntax, but there’s
<emphasis>essentially</emphasis> one XML document that represents any
given tree.</para></footnote>.
RDF is a collection of (subject, predicate, value) triples that generally
speaking form a graph.
RDF/XML is a transfer syntax for graphs. There are lots of ways to “flatten”
a graph into a tree. There will always be significant variation in the possible
RDF/XML serializations of an RDF graph.
</para>

<para xml:id='p5'>Having a standard transfer syntax is great. The fact, for
example, that the PSVI <emphasis>doesn’t</emphasis> have one, is a
common source of irritation. XML is a great transfer syntax for RDF.
XML is an ideal format for this sort of “core dump”: it’s amenable to
machine processing and it’s possible for a human being (with
sufficient skill, experience, and dedication) to look at it in a text
editor and “figure it out”.</para>

<para xml:id='p6'>So RDF/XML is good for RDF core dumps.</para>

<para xml:id='p7'>But is it something users should be writing by hand? I’m not sure.
And with impeccable timing, <personname><firstname>Edd</firstname>
<surname>Dumbill</surname></personname> enters the scene at this point
and announces <link xlink:href="http://usefulinc.com/doap/">DOAP</link>.</para>

<para xml:id='p8'>DOAP is an RDF vocabulary for describing metadata about projects. 
I have lots of projects, maintaining the metadata about them (web pages,
syndication feeds, freshmeat announcements, CVS tags, email announcements,
etc.) is tedious and error-prone. Having a standard way to represent this
data is a <emphasis>fabulous</emphasis> idea.</para>

<para xml:id='p9'>So how should we store this metadata?</para>

<variablelist>
<varlistentry><term>In RDF/XML</term>
<listitem>
<para xml:id='p10'>One way would be to store the data directly in RDF/XML or some other
RDF transfer syntax. That’s (too) flexible and hard to validate. Besides, I’m
already suspicious that RDF/XML is for core dumps.</para>
</listitem>
</varlistentry>

<varlistentry><term>In a DOAP XML Format</term>
<listitem>
<para xml:id='p11'>Edd has taken a stab at making DOAP more palatable to the XML crowd
by providing a
<link xlink:href="http://www-106.ibm.com/developerworks/xml/library/x-osproj4/">RELAX NG
grammar</link> for DOAP files. That’s cool. Now he’s
got an XML format that just happens to be isomorphic to one possible RDF/XML
serialization. Does that really count? Yes, I think it does. How can I argue
that it doesn’t? It’s an XML format that I can edit with my normal RELAX NG-aware
editors.</para>
<para xml:id='p12'>But I’m leaning towards making “announcement” essays for my projects,
so having this information in a separate file doesn’t seem right. Duplication
of information is bad.</para>
</listitem>
</varlistentry>

<varlistentry><term>As Metadata in an Essay</term>
<listitem>
<para xml:id='p13'>My next idea was to make the DOAP vocabulary a metadata vocabulary
to put in the essay’s <tag>info</tag> element, just like I currently
allow Dublin Core terms in there.</para>
<para xml:id='p14'>I implemented that, and it worked, but it didn’t really solve the
duplication of information problem.</para>
</listitem>
</varlistentry>

<varlistentry><term>As <emphasis>Data</emphasis> in an Essay</term>
<listitem>
<para xml:id='p15'>At this point, I realized that I was going about this backwards.
An RDF-primary focus isn’t a very XML approach to the problem. What I should
do is put the information <emphasis>in the body of the essay</emphasis>
with enough markup to identify it.</para>
<para xml:id='p16'>I <link xlink:href="/2004/projects/sxpipe">implemented that too</link>.
I ended up with a bunch of <tag class="attribute">role</tag> attributes
on phrases and links to achieve it. It’s not the most attractive markup,
but I haven’t thought deeply yet about what the right markup is. (One
interesting exception is the <tag>doap:license</tag> element, which I left
in the <tag>info</tag>. I can’t think of an element that has the right semantics:
preserve the link, but don’t display the URI or make the link “hot”.)
</para>
<para xml:id='p17'>One thing that did occur to me, and that I think will occur to a lot
of you, is putting the DOAP markup right in the essay. Instead of saying
“<code>&lt;phrase&#160;role="doap.name">name&lt;/phrase></code>”, say
“<code>&lt;doap:name>name&lt;/doap:name></code>”. I see three problems with that
approach: first, it doesn’t work for all the DOAP structures because some
of them are nested; second, I’d have to define the prose processing expectations for
all those elements; and third, everyone looking at my XML would have to understand
all those elements, using standard elements with roles makes document interchange
easier.</para>
</listitem>
</varlistentry>
</variablelist>

<para xml:id='p18'>I’m not done thinking about this issue yet, but this little case study
supports a direction that’s starting to feel right to me: RDF is a good
tool for aggregating and analyzing data, but it’s not the right tool for
creating or maintaining information. In a sense, (some of) the RDF community are
already leaning this way too, with proposals like
<link xlink:href="http://www.w3.org/TR/grddl/">GRDDL</link> being developed
to define standard ways for extracting RDF from data that’s richly marked
but not directly encoded in RDF/XML.</para>

<para xml:id='p19'>But for the record, the fact that I have to embed RDF/XML
<emphasis>in comments</emphasis> in XHTML still sucks.</para>

</essay>
