Preview

Comment:

Posted by

Comment

Name: You must provide your name.
Email*: You must provide your email address.
  *Please provide your real email address; it will not be displayed as part of the comment.
Homepage:
Comment**:
  **The following markup may be used in the body of the comment: a, abbr, b, br, code, em, i, p, pre, strong, and var. You can also use character entities. Any other markup will be discarded, including all attributes (except href on a). Your tag soup will be sanitized...
What is ten minus two?
  In an effort to reduce the amount of comment spam submitted by bots, I'm trying out a simple CAPTCHA system. In order to submit your comment, you must answer the simple math question above. For example, if asked "What is the two plus five?", you would enter 7.
Remember me? (Want a cookie?)

 (There must be no errors before you submit.)

The body of the essay you are commenting on appears below. Certain features, such as the navigation, are not supported in this preview. I might someday fix that. Or not.


There has been long debate, both philosophical and technical, on the relative merits of the distinction (or lack thereof) between names and addresses. I’ve said my piece.

Man is not logical and his intellectual history is a record of mental reserves and compromises. He hangs on to what he can in his old beliefs even when he is compelled to surrender their logical basis.

John Dewey

Before the web, there was SGML. SGML identifies external subsets, external parsed and unparsed entities, notations, and perhaps a few other things I’ve forgotten about, with external identifiers. External identifiers have two parts: a public identifier and a system identifier. The public identifier is “a name” and the system identifier is “a location”.

Historically, system identifiers weren’t URIs and what was a reasonable identifier in one system might have been unintelligible in another. Public identifiers provided a hook for interoperability. Both systems could find the external identifier associated with this document type declaration:

  1<!DOCTYPE book PUBLIC "-//Owner//DTD Name//EN" "c:\:/name.dtd">

because they had the name if they didn’t understand the location. In fact, in SGML, the system identifier was entirely optional:

  1<!DOCTYPE book PUBLIC "-//Owner//DTD Name//EN">

because implementations made use of the fact that they could map from the name, the public identifier, to the appropriate local representation.

OASIS, then called SGML Open, defined a standard mechanism for describing this mapping in TR 9401:1997, known colloquially as “SGML Open Catalogs” or “SOCATs”.

External identifiers survived into XML 1.0. In order to conform to the evolving architecture of the web, system identifiers were made required in XML.

Over the course of more than 10 years working with SGML and XML documents, the presence of names in external identifiers has saved many hours, perhaps many hundreds of hours, of my time. I consider that positive value.

As XML developed, I tried, unsuccessfully, to extend the notion of names and identifiers into the new technologies that were developing (stylesheets, schemas, etc.). With Paul Grosso and John Cowan, I wrote RFC 3151, A URN Namespace for Public Identifiers, in order to preserve public identifiers in a URI-only world.

I’ve argued my case in many forums. Most recently, this came up in a thread on the Atom mailing list. I have always been in the minority, though I have sometimes been encouraged by like-minded colleagues.

Web Architecture 

On the other hand, as a member of the Technical Architecture Group at the W3C, I have explicitly voted to approve Architecture of the World Wide Web 1.0 as a consensus opinion on web architecture.

That document says, in part:

So: I’ve got a new resource that I want to identify. Given my public committment to the WebArch document, I feel that I ought not to violate its tenets. That means I want to use a URI, I want to provide a representation, I don’t want to create multiple URIs, and I don’t want to use a new scheme.

The WebArch document expresses an explicit bias towards HTTP. There’s a whole set of infrastructure built around HTTP that makes it a pretty compelling protocol if you’re going to serve up a representation.

That means I’m going to identify my document with an HTTP URI and only an HTTP URI. That URI becomes both its name and its address, if you like (or even if you don’t).

All Is (Not Quite) Lost 

I’ve lost my names. Presented with a document, I will be forced to figure out what representation to use to process it based only on its single URI.

Remember my document interchange scenario? That’s where folks send me documents to process. It still happens, so what do I do with this document:

  1<book xsi:noNamespaceSchemaLocation="http://example.org/book.xsd">

On the web, maybe that’s easy, I just go off and get the resource. At this point, the infrastructure that I mentioned earlier comes into play. Perhaps some intermediate cache will return the representation, perhaps the server will tell us the document has moved and another get will be issued, etc. But what if I’m not connected?

I get some significant relief from XML Catalogs, developed by the Entity Resolution Technical Committee at OASIS. XML Catalogs provide for XML what SOCATs provide for SGML. In particular, they allow me to map external identifiers and URIs to local representations. So I can use this entry to map the URI:

  1<uri name="http://example.org/path/to/book.xsd"
  2     uri="/my/local/path/to/book.xsd"/>

Alas, it’s not a total win. What about documents like this:

  1<book xsi:noNamespaceSchemaLocation="../../book.xsd">

If I don’t have book.xsd in the same relative location as the sender, I lose. And in this case:

  1<book xsi:noNamespaceSchemaLocation="file:///c:/path/to/book.xsd">

I just lose outright, although in this case I could argue that the author is at fault: he’s given a different URI to the same resource, bifurcating the web. But if caches or resolvers of some sort aren’t widely deployed, authors will do this, because they don’t have a practical alternative, and I lose.

Planet Web 

I live on Planet Web too. I pour a fair amount of my intellectual effort into understanding and expanding that planet (even if that metaphor doesn’t scan very well). I don’t have to like all of the consequences of choosing to live on that planet, but having made that choice, it makes little sense to carp about its basic principles.

I hearby abandon argument about the useful distinction between names and addresses. Do what WebArch says. Give resources one URI. Provide representations for your resources. Choose a URI scheme that has useful retreival semantics. That probably means HTTP. To the extent that the consequences of doing what WebArch says are painful, let’s work on fixing the pain.