To RDF or not?

Volume 14, Issue 7; 22 Feb 2011

Over the weekend, I set about to rip out the last of the RDF from this site. Then I changed my mind.

First, a little history. When I started this weblog, I did so in part to experiment with a number of technologies, including RDF. The system I built did use RDF and I was able to leverage benefits from it: combining data from several sources and using inference to generate some metadata that was used elsewhere.

But those benefits came at a substantial performance cost. Generating the pages involved an RDF shake-and-stir process that took several minutes to complete. That's not an indictment of RDF, or at least I don't mean it as one. I don't believe that it's impossible to make RDF run fast, I simply wasn't using a tool that was fast.

When I rewrote the weblog infrastructure to run on MarkLogic Server, I took all of that out. Combining data is straightforward in the server and the sorts of things for which I used inferred metadata, I could (more) easily do with XQuery.

The one place where RDF remained was in the where/what/who pages. The problem was that those pages had drifted out of sync with the actual sources for that data. This bites me often enough to need fixing.

For example, if I add an itinerary to my calendar, I make sure the airports involved are in my address book. I use that data to generate the itinerary markup for my travel pages. The server also needs the airport data so that the Google map can be drawn on the itinerary pages. That means adding the airport data separately to the server.

This weekend, I set out to replace all the RDF data with more recent plain-old-XML directly from, for example, my address book. There were a couple of problems with this approach. The first is that I use a sort of quasi-RDF model for my address book anyway. Here, for example, is a crude representation of the address book entry for Denver International Airport:

  Denver International Airport
work  +1-800-247-2336
work  8500 Peña Blvd
  Denver CO 80249-6340
p:class airports
geo:lat 39.861698
geo:long -104.672997
rdf:type wn:Airport
tax:wikipedia Denver_International_Airport

So what this really boiled down to was a plan to replace RDF that looks like this:

<rdf:Description xmlns:pim=""
   <rdf:type rdf:resource=""/>
   <rdf:type rdf:resource=""/>
   <rdf:type rdf:resource=""/>
   <c:associatedName>Denver International Airport</c:associatedName>
   <c:workPhone rdf:resource="tel:+1-800-247-2336"/>
   <v:workAdr rdf:parseType="Resource">
      <rdf:type rdf:resource=""/>
      <v:street-address>8500 Peña Blvd</v:street-address>
   <foaf:homepage rdf:resource=""/>
   <foaf:homepage rdf:resource=""/>
   <p:weather rdf:resource=""/>
   <geo:lat rdf:datatype="">39.861698</geo:lat>
   <geo:long rdf:datatype="">-104.672997</geo:long>
   <owl:sameAs rdf:resource=""/>

With plain-old-XML that looks like this:

<contact xmlns:rdf=""
         xml:id="DEN" company="1">
   <foaf:name>Denver International Airport</foaf:name>
   <company>Denver International Airport</company>
   <phone type="work">+1-800-247-2336</phone>
   <address type="work">
      <street>8500 Peña Blvd</street>
   <rdf:type rdf:resource=""/>
   <p:weather rdf:resource=""/>
   <uri type="homepage"></uri>
   <uri type="homepage"></uri>

On the whole, not really that much of a change. But it would be slightly simpler if I could just poke my address book into the database.

Trouble is, a little grepping (or rather, acking) revealed that I use the RDF data in a few places. There's nothing that couldn't be changed to use the new markup instead, but that would mean changing and testing each place.

About twenty minutes into my hacking, I decided that the simplest thing to do would be to write some XSLT to generate the RDF from the XML and leave the RDF alone in the database. It took several hours to get the two sets of data aligned again [never, ever, let there be two different sources for the same data, you know this! -ed] but I managed to get there.

I swear this essay seemed like it was worth writing before I started. Now I'm not so sure. Oh, well, hardly the first time I've wasted ten minutes of someone's time. Sorry, though.


>Apology accepted. Just dont do it again, yes?

—Posted by Not a spammer on 23 Feb 2011 @ 09:16 UTC #

No promises. Caveat lector.

—Posted by Norman Walsh on 23 Feb 2011 @ 08:52 UTC #

Didn't know about ack. Now I do. Maybe it makes you feel a teeny bit better about posting.

—Posted by Marc on 26 Feb 2011 @ 12:46 UTC #

Definitely worth the time to read just to learn about ack - that other stuff is good too! :)

—Posted by William on 08 Mar 2011 @ 04:36 UTC #