Another wack at a permathread in web architecture. A new URI scheme is not necessary to, nor does it actually, solve the perceived problem of names and addresses.
Time and again, we see individuals and organizations inventing new URI schemes in order to tackle the problem of “names” versus “addresses”. That is, they want to provide some sort of a globally unique identifier for “This Thing” independent of where representations of that thing might reside. Almost inevitably, these individuals and organizations fall into the trap of thinking that an “http” URI is somehow an address and not a name and is, therefore, inappropriate for their purpose. They are mistaken. I used to believe this too and I was wrong. A new URI scheme is not necessary, nor does it actually solve the problem.
I fear that most of this essay will recapitulate arguments already presented in URNs, Namespaces and Registries, a TAG Finding under development by Henry Thompson and David Orchard, but this misunderstanding about the nature of URIs is so common, I think it probably bears repeating. (I'm not going to recapitulate all of the arguments, so please don't consider this essay any sort of substitute for the finding.)
By any other name…
If I hold up an object, an object that you have never seen before and which you do not recognize, and I tell you the name of this object is “HB88”, then that is its name. You might invent other names for it, “that weird cube”, “the object Norm held up in the meeting”, “Fred”, but you have no grounds to argue that “HB88” is not its name. One reason names exist is to facilitate the social process of communication. If you walk up to me after the meeting and ask, “Norm, can I see HB88?” and I respond, “Huh? What's that?” then I've violated your social expectation about names. But let's agree that we'll uphold social expectations. You'll ask me for HB88 and I'll hand it to you.
In fact, there is a potential problem with the name “HB88”: it might be ambiguous. There might be a dozen things named HB88 in the world and so I might not know which one you meant. But we're really going to be talking about URIs so we can avoid that problem.
Now, clear your mind of preconceptions. No preconceptions. Are you ready? If I hold up another object and tell you that the name of this object is “http://norman.walsh.name/knows/what#nikon-5700”, then that's its name. Don't try to decode that string for a moment, just accept it as a string. I assert that there's nothing intrinsic about that string that makes it less suitable as a name than “HB88”. Ok, it's sort of long and it's hard to pronounce, but let's agree those aren't important problems. I think we can agree that, in principle, it's just a string and if I use it as a name, it is a name.
Trouble is, the chances are good that you've been using the web for a while and you've become used to typing strings like “http://…” into your web browser. You do have some preconceptions about those names. I'll bet that many of you recognize “norman.walsh.name” as the name of a particular machine somewhere on the network and the some among you recognize that “http” identifies a protocol that can be used to transmit representations of the resource over the network.
Consequently, you might argue that the “http” name is a bad choice. It is tied to particular machine on the network and requires a particular protocol to access it. You are all too aware that DNS registrations expire and may change hands and that deleting or renaming files on that machine is likely to change the retrieval characteristics of the resource in question.
Instead, you might propose that the name “newscheme:x:y:n5700” is a much better name because it doesn't suffer from any of those problems.
But wait a minute. You've gotten way the heck out in front of me. We were talking about names. We weren't talking about retrievability or persistence or anything like that. None of those arguments has anything to do with names. As names, “HB88”, “http://norman.walsh.name/knows/what#nikon-5700”, and “newscheme:x:y:n5700” are entirely equivalent. They're just strings.
It turns out that what you really want then is a name that can be created in a distributed fashion, unambiguously identifies the resource you intended, is persistent, and can be used to retrieve representations. Ok, on that basis, let's toss “HB88” as inadequate and concentrate on the other two names.
There are lots of ways to achieve distributed naming, but most of them are pretty useless for this task because we don't just want names, we want (reasonably) memorable names. Computers have no trouble dealing with UUIDs, but humans just can't cope.
That makes the distributed naming task a social one. First, we establish some space for names, then we setup some system for dividing up the naming space, and then we setup some system for handing portions of it out to the folks that need to make up names.
In the “http” case,
I've leased a hunk of the naming space by registering
norman.walsh.name, so I
get to make up names that begin “
and you don't.
There's no technical solution to the problem of unlicensed use of names (making up names in a part of the space you don't have license to use), so that's an orthogonal issue.
we'll have to create some other organization to create, divide, and
maintain the naming space.
If I had to place a bet,
I'd gamble that the organization that maintains DNS names will outlast
any new organization created to manage “newscheme” names, but I suppose
that's not really an issue for organizations with deep enough pockets.
The extent to which either kind of name satisfies our requirements for distributed names is dependent on the mechanisms that are established to facilitate their creation.
Assuming that there are no bugs in our system for assigning distributed names, the extent to which any individual name is ambiguous is an entirely social issue.
There's nothing that can be done technically to prevent me from using the same name for two different things. As long as there's nothing done technically that requires me to use the same name for two different things, it's just a matter of diligence and trust.
On this score, the “http” name and the “newscheme” name are indistinguishable.
The hard part of persistence, like distributed naming and ambiguity, is social. The only thing that makes a name not persistent is if someone uses it ambiguously. We've already agreed not to do that, so there's really no problem.
In the case of the “http” name, the only part of the name that appears beyond my control is the DNS name. If I fail to maintain my registration, it may get reassigned and the new owner may not feel any obligation to maintain the unambiguous nature of the names I assigned. The solution for this problem seems straightforward to me. If you're worried about the persistence of my names, ask me to demonstrate that I've purchased a 10 year lease on the domain name, or a 100 year lease, or a lease in perpetuity if you've got the legal framework to do that. Problem sorted.
In the case of “newscheme” names, users actually have to be persuaded to follow the mandate of using only the portions of the namespace that they've registered (considering the widespread use of unregistered URI schemes and unregistered URN namespaces, there's no reason to be optimistic on this score), the organization created to manage the naming space has to exist indefinitely, and it has to successfully manage the naming space.
Again, on this score, I think the safe money is on the DNS system.
One common social contract is that a name, once created, will always refer to the same thing or sequence of bits. So, if I ask you for something with a particular name and you return “101010”, I can be certain that you will always return “101010” when I ask for that name. This is a matter of trust between us. Unless the sequence of bits is actually encoded in the name, there is nothing about the name per se that can enforce this constraint.
The same is true for any representation and any negotiated set of acceptable alterations to the sequence of bits (or whatever is returned).
So in terms of fidelity, all names are created equal.
By extension, if we agree to allow copies of the resource to be distributed across the network, the extent to which we can be sure that all the copies have appropriate fidelity is independent of the name of the resource.
So the question becomes simply, is the name retrievable?
On this score, the “http” name has a clear advantage. There already exists a huge, deployed infrastructure for handling efficient, cached distribution of resources over the HTTP protocol and tying the “http” name to that infrastructure is dead obvious.
It's important to note two things here:
Contrary to what you may believe, there is nothing about the “http” URI scheme that requires use of the “http” protocol. And even where the “http” protocol is used, there's nothing about it that requires access to any particular machine.
On your desktop, your web browser may return things from its cache without ever hitting the web.
On my desktop, I've got even more infrastructure in place. An attempt to retrieve an http: scheme URI starts by looking up that URI as a name in a table and returning a local copy of the resource if it exists. If it doesn't exist in that table, it goes to a local proxy which may return a cached copy without ever hitting the web. Beyond that, the request goes off into the web where other caching proxies may come into play.
Even in the case where there are no cached representations and you're using HTTP to connect to a particular host name, there is no reason to believe that a given host name (e.g., “
norman.walsh.name”) refers to a single, physical machine. Although my domain is on a single machine, the W3C domain is hosted on at least a half dozen machines around the world. So an attempt to access a resource on
www.w3.orgdoesn't actually imply a network transfer across the globe to some machine in Cambridge, MA. On an even bigger scale, companies like Akamai have built a business around transparent, global distribution of enormous quantities of information.
So there's nothing unsuitable about “http” names from the perspective of retrievability.
For the “newscheme” names, making them retrievable would require deployment of an entire infrastructure in parallel to the infrastructure that already exists for HTTP.
In practice, this is so expensive, difficult, and impractical, that most systems defer to HTTP for the actual retrieval. So the mechanism for retrieving “newscheme:x:y:n5700” is to retrieve “http://example.org/resolver?newscheme:x:y:n5700” (or some such mechanism) which makes the actual bits returned subject to exactly the same issues as simply using an “http” name in the first place.
Beyond the fact that there's no reason to invent a new scheme, one of the things I find personally irksome about most of the proposals for new schemes is that the organizations proposing them want me to pay money for the privilege of using them.
I've already paid to register my domain name. I pay a hosting company to provide a server that responds to requests to access that domain name. I need to pay for a different URI scheme, why exactly?
In fairness, the organizations need to make enough money to stay in business so that they can live up to their obligations of managing the naming space and the other social aspects of names. Problem is, I don't see them offering anything I don't already have.
URIs are names
They're all names. There's no technical reason to invent new URI schemes to address the goal of providing names that can be created in a distributed fashion, that unambiguously identify a resource, are persistent, and can be used to retrieve representations.
I hope this essay helps to clarify that we already have all we need.