Metadata big bang

Volume 10, Issue 14; 18 Feb 2007; last modified 08 Oct 2010

Hacking httpRange-14.

In real life, unlike in Shakespeare, the sweetness of the rose depends upon the name it bears. Things are not only what they are. They are, in very important respects, what they seem to be.

—Hubert H. Humphrey

Semantic web enthusiasts among my readers will have encountered the “hash vs. slash” debate, perhaps most famously in the TAG's attempt to resolve the range of HTTP: URIs.

When I first started this weblog, I knew it was going to be built on top of a semantic web framework, not out of any fervent belief that it's the future, but out of the conviction that I won't be able to say whether or not I think it's the future if I don't try it out.

I also knew about the hash vs. slash debate.

While I was never entirely convinced by the arguments of the “anti-slash” camp, I decided to simply avoid the issue by using a hash. There's never been any argument about hashed URIs, only slashed ones. As a result, the identifier for me, my physical person, became:

http://norman.walsh.name/knows/who#norman-walsh

As time passed, this had a practical consequence. If you dereferenced that URI, the server would send you the whole “who” file that contained all the metadata about everyone. That file got to be big.

I ignored this problem as long as I could, simply living with the inconvenience, but when I decided to support “link groups” I faced a real hurdle.

The obvious URI for the “link group” about me is the URI that identifies me. But linking to the hashed URI made following the link way too expensive to be of practical value. I could have cooked up an alternate URI for the link group, but that would effectively have been an alias. Aliases: bad.

Unable to come up with a workaround I liked, I decided it was time for a big bang: I decided to change a whole lot of URIs. Instead of using a hashed URI to identify me (and everyone and everything else), I'd use a slashed one:

http://norman.walsh.name/knows/who/norman-walsh

In order to do this, I felt I needed to implement the TAG resolution on httpRange-14. Which I've done:

A GET on http://norman.walsh.name/knows/who/norman-walsh returns a 303 redirect to http://norman.walsh.name/knows/who/norman-walsh.html.
The HTML at http://norman.walsh.name/knows/who/norman-walsh.html includes a link to the metadata:
```
<link rel="alternate" type="application/rdf+xml"
      title="Metadata" href="norman-walsh.rdf" />
```
which I hope helps semantic web software find the underlying metadata.
That metadata in turn contains
```
<owl:sameAs rdf:resource="http://norman.walsh.name/knows/who#norman-walsh"/>
```
which tells the semantic web agent that this URI is entirely equivalent to the former hashed URI.

Hopefully this all “just works” in every meaningful way.

Comments

The link to "link groups" above is broken.

Looks very maintainable.

There is another very simple solution to the entire hash-versus-slash debate: whenever you would want to identify anything with a hashless URI, suffix it with #referent. The meaning of x#referent is: I identify whatever x is about. And x is simply an information resource (about x#referent). Of course this does not put the httpRange-14 resolution to the test, like your approach does.

See http://www.marcdegraauw.com/2007/02/20/the-referent-convention/ for details.

What doesn't convince me, is that the accept header is ignored and the redirect to the "uncool"*.html URI.

The URI of the person should redirect to the personal profile document, the personal profile document should derefence to the best representation according to the accept header. Why to force semweb clients to at least partially understand and download html?

Yes, I agree that the server should honor accept headers. Alas, my service provider doesn't support the content negotiation module so I can't. :-(

It would be useful to have a way to specify how to get from the information resource back to the non-information resource.

A reverse-link relationship seems appropriate, because a forward link suggests that it might be something to follow, and that wouldn't be useful in this case.

There isn't a link relation that quite has the semantics of a Location, or Content-Location header, but a rel=alternate link is close-ish, so that could be reversed to give:

Or you could use the RFC2068 Link header to do the same.

Any better ideas? I've seen rel=bookmark suggested, but that doesn't seem quite right.