<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
         xmlns:dc='http://purl.org/dc/elements/1.1/'
         xmlns:gal='http://norman.walsh.name/rdf/gallery#'>
<info>
<title>On the Web, My Name is 266 North Pleasant Street</title>
<volumenum>7</volumenum>
<issuenum>31</issuenum>
<pubdate>2004-03-03T14:41:00+01:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2004</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>There has been long debate, both philosophical and technical, on the
relative merits of the distinction (or lack thereof) between names and
addresses. I’ve said my piece.</para>
</abstract>
</info>

<epigraph>
<attribution><personname>
<firstname>John</firstname><surname>Dewey</surname>
</personname></attribution>
<para xml:id='p1'>Man is not logical and his intellectual history is a record of mental
reserves and compromises. He hangs on to what he can in his old
beliefs even when he is compelled to surrender their logical basis.</para>
</epigraph>

<para xml:id='p2'>Before the web, there was SGML. SGML identifies external
subsets, external parsed and unparsed entities, notations, and perhaps
a few other things I’ve forgotten about, with <emphasis>external identifiers</emphasis>.
External identifiers have two parts: a public identifier and a system
identifier. The public identifier is “a name” and the system
identifier is “a location”.</para>

<para xml:id='p3'>Historically, system identifiers weren’t URIs and what was a reasonable
identifier in one system might have been unintelligible in
another. Public identifiers provided a hook for interoperability. Both
systems could find the external identifier associated with this
document type declaration:</para>

<programlisting>&lt;!DOCTYPE book PUBLIC "-//Owner//DTD Name//EN" "c:\:/name.dtd"&gt;</programlisting>

<para xml:id='p4'>because they had the name if they didn’t understand the location.
In fact, in SGML, the system
identifier was entirely optional:</para>

<programlisting>&lt;!DOCTYPE book PUBLIC "-//Owner//DTD Name//EN"&gt;</programlisting>

<para xml:id='p5'>because implementations made use of the fact that they could map from
the name, the public identifier, to the appropriate local representation.</para>

<para xml:id='p6'><link xlink:href="http://www.oasis-open.org/">OASIS</link>,
then called SGML Open, defined a standard mechanism for describing this mapping in
<link xlink:href="http://www.oasis-open.org/html/tr9401.html">TR 9401:1997</link>,
known colloquially as “SGML Open Catalogs” or “SOCATs”.</para>

<para xml:id='p7'>External identifiers
survived into XML 1.0. In order to conform to the evolving
architecture of the web, system identifiers were made required in XML.</para>

<para xml:id='p8'>Over the course of more than 10 years working with SGML and XML documents,
the presence of names in external identifiers has saved
many hours, perhaps many hundreds of hours, of my time. I consider that positive
value.</para>

<para xml:id='p9'>As XML developed, I tried, unsuccessfully, to extend the notion of
names and identifiers into the new technologies that were developing (stylesheets,
schemas, etc.). With
<personname><firstname>Paul</firstname><surname>Grosso</surname></personname> and
<personname><firstname>John</firstname><surname>Cowan</surname></personname>,
I wrote
<link xlink:href="http://www.ietf.org/rfc/rfc3151.txt">RFC 3151</link>,
<citetitle>A URN Namespace for Public Identifiers</citetitle>, in order
to preserve public identifiers in a URI-only world.</para>

<para xml:id='p10'>I’ve argued my case in many forums. Most recently, this came up in
<link xlink:href="http://www.imc.org/atom-syntax/mail-archive/msg02850.html">a thread</link>
on the
<link xlink:href="http://www.intertwingly.net/wiki/pie/FrontPage">Atom</link>
mailing list. I have always been in the
minority, though I have sometimes been encouraged by like-minded
colleagues. </para>

<section xml:id="webarch">
<title>Web Architecture</title>

<para xml:id='p11'>On the other hand, as a member of the
<link xlink:href="http://www.w3.org/2001/tag">Technical Architecture Group</link>
at the
<link xlink:href="http://www.w3.org/">W3C</link>,
I have explicitly voted to approve
<link xlink:href="http://www.w3.org/2001/tag/2003/webarch-20031128/">Architecture
of the World Wide Web 1.0</link>
as a consensus opinion on web architecture.</para>

<para xml:id='p12'>That document says, in part:</para>

<itemizedlist>
<listitem>
<para xml:id='p13'><link xlink:href="http://www.w3.org/2001/tag/2003/webarch-20031128/#id-with-URI">The
Identification mechanism for the Web is the URI</link>.
</para>
</listitem>

<listitem>
<para xml:id='p14'><link xlink:href="http://www.w3.org/2001/tag/2003/webarch-20031128/#pr-service-uri">Publishers
of a URI should provide representations of the identified resource
consistently and predictably</link>.</para>
</listitem>

<listitem>
<para xml:id='p15'><link xlink:href="http://www.w3.org/2001/tag/2003/webarch-20031128/#uri-aliases">Resource
owners should not create arbitrarily different URIs for the same resource</link>.
</para>
</listitem>

<listitem>
<para xml:id='p16'><link xlink:href="http://www.w3.org/2001/tag/2003/webarch-20031128/#pr-new-scheme-expensive">Authors
of specifications should not introduce a new URI scheme when an
existing scheme provides the desired properties of identifiers and
their relation to resources</link>.
</para>
</listitem>
</itemizedlist>

<para xml:id='p17'>So: I’ve got a new resource that I want to identify.
Given my public committment to the WebArch document, I feel that I ought not to
violate its tenets. That means
I want to use a URI, I want to provide a representation, I don’t want to create
multiple URIs, and I don’t want to use a new scheme.</para>

<para xml:id='p18'>The WebArch document expresses an
<link xlink:href="http://www.w3.org/2001/tag/2003/webarch-20031128/#URI-scheme">explicit
bias</link> towards HTTP. There’s a whole set of infrastructure built around
HTTP that makes it a pretty compelling protocol if you’re going to serve up
a representation.</para>

<para xml:id='p19'>That means I’m going to identify my document with an HTTP URI and only
an HTTP URI. That URI becomes <emphasis>both</emphasis> its name and its address,
if you like (or even if you don’t).</para>
</section>

<section xml:id="notquitelost">
<title>All Is (Not Quite) Lost</title>

<para xml:id='p20'>I’ve lost my names. Presented with a document, I will be forced to
figure out what representation to use to process it based only
on its single URI.</para>

<para xml:id='p21'>Remember my document interchange scenario? That’s where folks send me
documents to process. It still happens, so what do I do with this
document:</para>

<programlisting>&lt;book xsi:noNamespaceSchemaLocation="http://example.org/book.xsd"&gt;</programlisting>

<para xml:id='p22'>On the web, maybe that’s easy, I just go off and get the resource. At this
point, the
infrastructure that I mentioned earlier comes into play. Perhaps some
intermediate cache will return the representation, perhaps the server will tell us the
document has moved and another get will be issued, etc. But what if I’m not connected?</para>

<para xml:id='p23'>I get some significant relief from 
<link xlink:href="http://www.oasis-open.org/committees/download.php/4952/wd-entity-xml-catalogs-1.0_2e.html">XML
Catalogs</link>, developed by the
 <link xlink:href="http://www.oasis-open.org/committees/entity/">Entity
Resolution Technical Committee</link> at OASIS.
XML Catalogs provide for XML what SOCATs provide for SGML.
In particular, they allow me to map external identifiers <emphasis>and URIs</emphasis>
to
local representations. So I can use this entry to map the URI:</para>

<programlisting>&lt;uri name="http://example.org/path/to/book.xsd"
     uri="/my/local/path/to/book.xsd"/&gt;</programlisting>

<para xml:id='p24'>Alas, it’s not a total win.
What about documents like this:</para>

<programlisting>&lt;book xsi:noNamespaceSchemaLocation="../../book.xsd"&gt;</programlisting>

<para xml:id='p25'>If I don’t have <filename>book.xsd</filename> in the same relative location
as the sender, I lose. And in this case:</para>

<programlisting>&lt;book xsi:noNamespaceSchemaLocation="file:///c:/path/to/book.xsd"&gt;</programlisting>

<para xml:id='p26'>I just lose outright, although in this case I could argue that the author
is at fault: he’s given a different URI to the same resource, bifurcating the
web. But if caches or resolvers of some sort aren’t widely deployed, authors will
do this, because they don’t have a practical alternative, and I lose.</para>

</section>
<section xml:id="planetweb">
<title>Planet Web</title>

<para xml:id='p27'>I live on <link xlink:href="http://www.imc.org/atom-syntax/mail-archive/msg03016.html">Planet
Web</link> too. I pour a fair amount of my intellectual effort into understanding 
and expanding that planet (even if that metaphor doesn’t scan very well). I don’t
have to like all of the consequences of choosing to live on that planet, but having
made that choice, it makes little sense to carp about its basic principles.</para>

<para xml:id='p28'>I hearby abandon argument about the useful distinction between
names and addresses. Do what WebArch says. Give resources one URI.
Provide representations for your resources.
Choose a URI scheme that has useful retreival semantics. That probably means
HTTP. To the extent that the consequences of doing what WebArch says are
painful, let’s work on fixing the pain.</para>

</section>
</essay>
