Vicious Circle

Volume 6, Issue 66; 29 Jul 2003

The TAG is trying to get to last call. There's lots of hard work left to do on our principal deliverable, but hard work isn't a problem. Intractable issues, those are a problem. The question is, how intractable is httpRange-14?

The first step towards wisdom is calling things by their right names.

—Chinese Proverb

The TAG is working hard to get its principal deliverable, Architecture of the World Wide Web farther along the recommendation track. We'd like to get to last call this year. Several of us think it's very important that we do.

There's lots of wordsmithing to be done yet, the document needs to be really clear about some of its key terms (resource, representation, identifies, etc.). Different communities bring different perspectives to the table and this document is important to a lot of communities. There's still some hard work ahead.

We also have a major, contentious issue on the table: httpRange-14. I'm not certain that we need to solve this issue before we go to last call, but I'm not certain we don't either. I am certain, however, that getting it sorted would make forward progress easier.

This is an issue we've talked about for a long time, but we've made little progress. It flared up recently and there have been a few long threads on www-tag this month.

Last night it occurred to me that this issue is related to an issue that's older still: names and addresses. At least, I think it's related. And it's a vicious circle of about a dozen steps.

The hypertext web is built on the http protocol. This protocol is handy because it can actually be used to get documents over the web and it has enough flexibility to handle things like content negotiation. Unlike ftp and gopher and some of its other predecessors, it communicates not only the bits of the document but also metadata headers about the document.
Things that start “http:” used to be called URLs. URL was an acronym for Uniform Resource Locator, which suggests that these things are for saying where something is located, its address not its name.
URNs, Uniform Resource Names, were developed at about the same time. They generally start with “urn:”, although there are other schemes that appear equally name like (“uuid:”, “mid:”, etc.). The name URN suggests that these things are for saying what something is called, its name not its address.
The practical downside of URNs is that even if they identify documents on the web, most people don't know how to get from the URN to the document. You can't just type “urn:...” into the address bar of most browsers and expect anything useful to happen.
For this and perhaps other reasons, it has been asserted that URLs are as good a name for something as a URN and that, in fact, the distinction between names and addresses is inappropriate.

Bah. I don't believe it. I never have. The very farthest I can make my mind go is the observation that in principle a string that begins “http:” is no less useful syntactically as a name than any other string. I maintain however, that using these strings as names is confusing because it violates the niave users expectation that they're addresses. (But Norm, users don't think they're addresses, they think they're names. No they don't. Ask my Mom, she thinks she's “going to a website” when she uses one of those strings and even sophisticated users speak of “getting a page from that server”. Those are very address-bound descriptions, not name-bound.) I could legally change my name to Cheyenne Wyoming if I wanted to, but it'd be a source of confusion every time I was asked for my name and address.

Anyway, I am so often so completely outnumbered with respect to this view that I've largely given up. I don't accept that that makes me wrong, but I do accept that the really important thing in standards work is consensus. You have to pick your battles and this isn't one I'm going to win, at least not in the short term.
The term URI, Uniform Resource Identifier, was chosen as a more neutral term for the general class of web identifiers. It means URLs and URNs and all the other strings that obey the rules of web identifiers (RFC 2396).
Any URI is a legitimate web identifier, but the most useful identifiers are ones for which you can get back some description. In other words, even if you create a URI that is useful without a representation (e.g, an XML Namespace), it's better if you give it a representation.

Life is better if, when you encounter a URI for something and you don't know what it is, you can stick it in your browser and get back something that tells you what it is. I accept that.
All of this worked perfectly well for many years. But now there's a trend towards using URIs for purposes other than putting pages up on the web. RDF and other Semantic Web activities are using URIs to identify people and places and cars and dreams.

There's some disagreement about exactly how to name an object in the real or imaginary world with a URI. Some folks says you should use a fragment identifier, others say you don't have to. My mind isn't made up. I generally do use a fragment identifier because I think the assertion that a URI with a fragment identifier points to a dream or a car is absolutely unassailable. From a specification legalese point of view, that's rock solid. But some folks don't use a fragment identifier and yet I still seem to be able to make productive use of their URIs, so it's not obvious to me that failure to use a fragment identifier breaks anything. (I chose my words carefully, I'm not saying categorically that things don't break, I'm saying I haven't seen any breakage.)
This brings us to the thorny TAG issue httpRange-14. The thrust of this issue is that some folks say http URIs (without fragment identifiers) must point to documents and others say that URIs are totally opaque when used as identifiers and so any URI can point to anything.
The folks who say that they have to be documents, or more generally in the last few days “information resources”, claim that its imperative to be able to distinguish between documents that are about something and the something they're about.
Given a URI (a name) for something, they want to be able to go off and find (at an address) the description of that thing. The URI that points to the description has to point to a document because its documents that contain descriptions.

Look, they say (or I think they say, I struggle mightily to understand their point of view, but I may be failing), if you tell me that http://example.com/foo is a car and I go off to http://example.com/foo and get back a description of that car, the description must come from a document so there's automatically a contradiction. That URI can't be both a car and a document because cars and documents are disjoint.
Part of the problem here is that it's so important to be able to go off and find a description of something given its name, everyone wants to use http URIs. See step 1.

If we weren't conflating names and addresses in http URIs, this problem would go away. We'd have to deploy a universally available technical solution to the problem of getting from a name to an address, but at least we'd be able to distinguish between the names of things and their addresses.

I think that's evidence that I'm right about names and addresses. But I fully expect to be outnumbered.

Comments

I would keep discusion of this in the www-tag@w3.org archive, except that the questions and answers are old. For the myth that this is solved by distinguishing between names and addresses, see <http://www.w3.org/DesignIssues/NameMyth> . For the reaon why URIs have to an information resource by any other name, see <http://www.w3.org/DesignIssues/HTTP-URI#argument>.

I'd be interested to know how you justify the statement:

I think the assertion that a URI with a fragment identifier points to a dream or a car is absolutely unassailable. From a specification legalese point of view, that's rock solid.

My reading of RFC 2396 is quite the opposite. It says, at the end of section 4.1:

The fragment identifier if only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.

In general, I don't see why something which appears to be a reference to a part of or point in a document is any more a candidate to identify a non-network retrievable object than something which appears to reference a whole document.

I think there is a strange, narrow, and counterintuitive path between Scylla and Charybdis that actually works.

a) We first have to plant in our minds that HTTP GET never returns a resource, it always returns a representation of the resource. If you get a 404, that doesn't mean that the resource doesn't exist, it means that no representation can be found.

b) Consequently, doing a GET on my home page does not return my home page; it returns an HTML representation thereof. Documents are just as "off the Web" as Shakespeare is; doing a GET on this does not of course return Shakespeare, and by the same token, doing a GET on this does not return the XML Recommendation.

c) That being so, when we use an URL as a name and attribute properties to that URL, it isn't clear whether we are attributing properties to the resource or the representation. Shakespeare-the-resource has a birthdate and a height in feet and inches; Shakespeare-the-representation has a creation time and a height in pixels. Shakespeare-the-resource had a wife and children; Shakespeare-the-representation has no descendants. All is muddle?

d) Almost but not quite. It now becomes a meta-property of each property whether it applies to resources or representations. The dc:creator property, for example, applies to the resource: it is the person who wrote the document, not the person who made this particular representation by copying it from somewhere else. In this way we escape the trap and go free: it is part of the context of use for each URI whether we are making resource-based or representation-based use of it. The topic-maps folks have always gotten this part right.