On the range of http: URIs

Volume 8, Issue 94; 19 Jun 2005; last modified 08 Oct 2010

A compromise has been reached, at least among those of us on the TAG. I hope the larger community will accept the compromise as well.

Language is by its very nature a communal thing; that is, it expresses never the exact thing but a compromise—that which is common to you, me, and everybody.

—T. E. Hulme

Records show that httpRange-14 was raised to the TAG on 25 March 2002. We've been debating its resolution for more than three years. I will not attempt to recapitulate that debate in any detail; your search engine of choice will no doubt lead you to it. Suffice it to say that some folks think http://example.org/some/path must identify “a document” (what the AWWW calls an “information resource”) and some folks think it can identify anything at all.

These world views are not compatible. We've gone around and around and around searching for a compromise. And at last, we have found one.

Remember: this is a compromise. What's needed here is something that everyone can live with, not something that necessarily makes everyone happy. Before you cry foul, if you're inclined to do so, consider carefully whether you think any better compromise is possible. Personally, I don't.

So here it is. Given an http: URI without a fragment identifier, http://example.org/some/path, you can't tell if it's an information resource or something more general. That's ok, given http://example.org/some/path#fragid, you can't tell what that is either. Maybe that's a hole in the web architecture, maybe it isn't, but this compromise doesn't introduce that problem. The bottom line is, if you haven't gone off and retrieved it yourself, maybe you believe what someone tells you about the URI and maybe you don't.

Let's say you want to check, so you go out and attempt to dereference the URI:

If you get back 200 OK:: The resource is an information resource. That means it isn't a car or a person or an idea.
If you get back 303 See Other:: The resource could be an information resource or it could be something more general. Maybe it is a car or a person or an idea, but maybe it's just a web page too. Hopefully the “see other” thing will help you figure that out.
If you get back 404 Not Found:: The nature of the resource is unclear. You haven't learned anything. You don't learn anything for any of the other 4xx errors either.

This compromise allows vocabularies (like Dublin Core, FOAF, and Wordnet) to continue to use “slash” instead of “hash” for pragmatic reasons. It also preserves the common understanding that if you dereference a URI and you get back a document, that document is in some significant sense the thing you were attempting to get. At the same time, it allows you to use any URI you want for any resource you want, as long as you are willing to take some care in how you serve up representations for it.

Comments

Why does it matter? Why can't an URI, an URI string, or an URIRef (or an URI whatevers [!]) mean whatever the minter says they mean? If I put a document containg a description about Bob at his URI, that doesn't mean the URI is an information resource for that document; it could be for convience for instance.

I think the context defines whether the resource is dereferencable or not. When defining URIRefs in RDF, they should never dereferenced - all that matters is the relationship betweeen names. However, when using URI strings in an HTML document's anchors it is important that EVERY one is dereferencable (otherwise I get broken links). It is up to the context to specify if they can be deferenced or not. After all, are there any specs that specify different behavior for indistinguishable URI strings that are dereferenceable or abstract?

P.S. The definition of "Information Resource" stinks. It allows postal mail to be considered an "Information Resource" (and if that's allowed, then the example of paper documents not being IRs is wrong since they can be mailed).

I think this is a reasonable compromise. I'd rather that HTTP URIs only referenced things you can get with HTTP but given the widespread existing usage it would be a Canute-like action to rule otherwise.

Is there a formal write-up anywhere, yet? The TAG issues document doesn't seem to reference one but maybe that's coming.

Do the same rules apply to HTTP URIs with fragment identifiers? I.e., is it that if you can GET the document and it is of a type for which fragment identifiers make sense and the document contains a fragment with that ID then it references that fragment, otherwise you don't know what it references?

To people not deeply immersed in the RDF world view, it seems very peculiar that an HTTP URL might deliberately not refer to an electronic document whose representation is downloadable using HTTP.

I quite like the conventions used in ISO topic maps, where they distinguish between using a URI to identify a resource (electronic document), and using a URI to identify a subject (= concept, topic). Subject indicators might or might not be derefenceable. For example, I can use http://topicmaps.org/1.0/languages.xtm#en as a subject indicator for the English language, without raising the question of whether one can download the English language by feeding this resource in to a web browser. Obviously RDF does the same, but topic maps make the distinction explicit.

I personally am convinced the resource http://www.alleged.org.uk/pdc/ is not me and I am not http://www.alleged.org.uk/pdc/, but I would more relaxed about being identified by the subject indicator http://www.alleged.org.uk/pdc/. But then maybe I am strange: I also object to our office convention of giving one's computer the same name as oneself.

Never mind... I guess that's what 303 is for. I still don't like it, but I guess that's why it is a compromise.

Bit of a rabbit out of the hat this. I didn't expect a resolution to httpRange-14 in this lifetime, yet it does seem a reasonable compromise. I assume a Note or somesuch will follow. I'm curious about the implications for namespaces, especially those for schemas - are they information resources or other-things? (I'd guess 303 to the schema doc).

The Topic Maps point is an interesting one, though I think I'd describe the RDF case the other way around: RDF in itself doesn't really understand HTTP so the notion of (non-)dereferenceability is orthogonal. But then again, maybe this resolution can bring RDF & HTTP closer (along the lines of URIQA)?

This page looks really bad as a pdf document.

I don't think this compromise helps much. My objection is that Cars are Information Resources and Documents are Objects Too. From that post: "The boundry between matter and information is fuzzy and indistinct. DNA is information but it is a molecule as well. Ink is a chemical liquid that dries on paper but can also be text. In the end, I think it will turn out the distiction between object-documents like those about cars and document-objects like the Declaration of Independence or the Mona Lisa will turn out to be personal and context dependent. The real distinction is between those things that now have an HTTP interface, and those that do not yet have one."

I've improved the PDF. It still has a couple of badly filled paragraphs, but nothing as awful as that list was. Thanks, Jimmy!