On the Web, My Name is 266 North Pleasant Street

Volume 7, Issue 31; 03 Mar 2004; last modified 08 Oct 2010

There has been long debate, both philosophical and technical, on the relative merits of the distinction (or lack thereof) between names and addresses. I’ve said my piece.

Man is not logical and his intellectual history is a record of mental reserves and compromises. He hangs on to what he can in his old beliefs even when he is compelled to surrender their logical basis.

—John Dewey

Before the web, there was SGML. SGML identifies external subsets, external parsed and unparsed entities, notations, and perhaps a few other things I’ve forgotten about, with external identifiers. External identifiers have two parts: a public identifier and a system identifier. The public identifier is “a name” and the system identifier is “a location”.

Historically, system identifiers weren’t URIs and what was a reasonable identifier in one system might have been unintelligible in another. Public identifiers provided a hook for interoperability. Both systems could find the external identifier associated with this document type declaration:

<!DOCTYPE book PUBLIC "-//Owner//DTD Name//EN" "c:\:/name.dtd">

because they had the name if they didn’t understand the location. In fact, in SGML, the system identifier was entirely optional:

<!DOCTYPE book PUBLIC "-//Owner//DTD Name//EN">

because implementations made use of the fact that they could map from the name, the public identifier, to the appropriate local representation.

OASIS, then called SGML Open, defined a standard mechanism for describing this mapping in TR 9401:1997, known colloquially as “SGML Open Catalogs” or “SOCATs”.

External identifiers survived into XML 1.0. In order to conform to the evolving architecture of the web, system identifiers were made required in XML.

Over the course of more than 10 years working with SGML and XML documents, the presence of names in external identifiers has saved many hours, perhaps many hundreds of hours, of my time. I consider that positive value.

As XML developed, I tried, unsuccessfully, to extend the notion of names and identifiers into the new technologies that were developing (stylesheets, schemas, etc.). With Paul Grosso and John Cowan, I wrote RFC 3151, A URN Namespace for Public Identifiers, in order to preserve public identifiers in a URI-only world.

I’ve argued my case in many forums. Most recently, this came up in a thread on the Atom mailing list. I have always been in the minority, though I have sometimes been encouraged by like-minded colleagues.

Web Architecture

On the other hand, as a member of the Technical Architecture Group at the W3C, I have explicitly voted to approve Architecture of the World Wide Web 1.0 as a consensus opinion on web architecture.

That document says, in part:

So: I’ve got a new resource that I want to identify. Given my public committment to the WebArch document, I feel that I ought not to violate its tenets. That means I want to use a URI, I want to provide a representation, I don’t want to create multiple URIs, and I don’t want to use a new scheme.

The WebArch document expresses an explicit bias towards HTTP. There’s a whole set of infrastructure built around HTTP that makes it a pretty compelling protocol if you’re going to serve up a representation.

That means I’m going to identify my document with an HTTP URI and only an HTTP URI. That URI becomes both its name and its address, if you like (or even if you don’t).

All Is (Not Quite) Lost

I’ve lost my names. Presented with a document, I will be forced to figure out what representation to use to process it based only on its single URI.

Remember my document interchange scenario? That’s where folks send me documents to process. It still happens, so what do I do with this document:

<book xsi:noNamespaceSchemaLocation="http://example.org/book.xsd">

On the web, maybe that’s easy, I just go off and get the resource. At this point, the infrastructure that I mentioned earlier comes into play. Perhaps some intermediate cache will return the representation, perhaps the server will tell us the document has moved and another get will be issued, etc. But what if I’m not connected?

I get some significant relief from XML Catalogs, developed by the Entity Resolution Technical Committee at OASIS. XML Catalogs provide for XML what SOCATs provide for SGML. In particular, they allow me to map external identifiers and URIs to local representations. So I can use this entry to map the URI:

<uri name="http://example.org/path/to/book.xsd"
     uri="/my/local/path/to/book.xsd"/>

Alas, it’s not a total win. What about documents like this:

<book xsi:noNamespaceSchemaLocation="../../book.xsd">

If I don’t have book.xsd in the same relative location as the sender, I lose. And in this case:

<book xsi:noNamespaceSchemaLocation="file:///c:/path/to/book.xsd">

I just lose outright, although in this case I could argue that the author is at fault: he’s given a different URI to the same resource, bifurcating the web. But if caches or resolvers of some sort aren’t widely deployed, authors will do this, because they don’t have a practical alternative, and I lose.

Planet Web

I live on Planet Web too. I pour a fair amount of my intellectual effort into understanding and expanding that planet (even if that metaphor doesn’t scan very well). I don’t have to like all of the consequences of choosing to live on that planet, but having made that choice, it makes little sense to carp about its basic principles.

I hearby abandon argument about the useful distinction between names and addresses. Do what WebArch says. Give resources one URI. Provide representations for your resources. Choose a URI scheme that has useful retreival semantics. That probably means HTTP. To the extent that the consequences of doing what WebArch says are painful, let’s work on fixing the pain.

Comments

If the "URI becomes both its name and its address" then http://example.org/path/to/book.xsd should always be the same (otherwise it's a different name thus stands for a different thing). No matter if you call it "canonical URL" or "URI", your catalog resolver will always fetch the same local copy of the resource identified via the URI [1]. If it doesn't find a local copy, it can try to load it from the URL (the same string as the URI). One string can be a URI when used as identifier/name, or as a URL when used as locator. (but TBL disagrees ...)

All this is not simple, but the fact that one string can serve as URI (eg a namespace name, or resource name used for fetching a local copy) and also serves as URL (eg to download the resource) is actually not a problem AFAICS, but can be quite convenient and useful, if everyone participating is aware of that fact.

The problems arise when people make up identifiers by filling in some local path which isn't the name of the resource; you can't know what their made up name refers to. It's as if everyone would make up new names for everything; they shouldn't be surprised if noone understands them anymore.

[1] Just like SGML "implementations made use of the fact that they could map from the name, the public identifier, to the appropriate local representation."

'266 North Pleasant Street' isn't a name for you; it's a name for your house. You're related to, but not the same as, your house.

On the web, http://norman.walsh.name/ is both a name and a location for your web site. Again, it's related to you, but it's not you.

You seem to use http://norman.walsh.name/knows/who#norman-walsh to identify yourself. On the web, that's your name. Or... one of your names, anyway.

The alternatives you suggest, Dan, are all addresses. They're also names. I've argued in the past that my name is not my address, and the point of the title of this essay is that I'm abandoning that line of argument. Names and addresses are the same thing on the web.

But you're right, in the semantic web context, I use http://norman.walsh.name/knows/who#norman-walsh as my canonical name.

The fact that, as a practical matter, I sometimes wish I'd used http://norman.walsh.name/knows/who/norman-walsh is a topic for another essay.