<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
<title>Human Readable Resource Identifiers</title><biblioid class="uri">http://norman.walsh.name/2007/04/30/hrri</biblioid>
<volumenum>10</volumenum>
<issuenum>40</issuenum>
<pubdate>2007-04-30T14:37:32-04:00</pubdate>
<date>$Date: 2007-04-30 15:43:46 -0400 (Mon, 30 Apr 2007) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2007</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>Dealing with the things that you type that look mostly like URIs but
aren't.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<epigraph>
<attribution>
      <personname>
<firstname>Keith</firstname>
	<surname>Braithwaite</surname>
</personname>
    </attribution>
<para xml:id="p2">It's a curious thing about our industry: not only do we not
learn from our mistakes, we also don't learn from our successes.
</para>
</epigraph>

<para xml:id="p1">There are lots of places where we expect authors to
type URI values. Left to their own devices, authors type these
identifiers in a “human readable” form; that is, they may contain
spaces, punctuation characters, non-ASCII text, etc.</para>

<para xml:id="p3">Consider the
current state of play in the XML specifications:</para>

<itemizedlist>
<listitem>
<para xml:id="p4">Although we think of, and casually describe, XML system identifiers as
<wikipedia page="Uniform_Resource_Identifier">URI</wikipedia>s (or, more accurately,
<wikipedia page="Internationalized_Resource_Identifier">IRI</wikipedia>s), both
<link xlink:href="http://www.w3.org/TR/REC-xml/">XML 1.0</link> and
<link xlink:href="http://www.w3.org/TR/xml11/">XML 1.1</link> describe
<link xlink:href="http://www.w3.org/TR/REC-xml/#dt-sysid">system identifiers</link>
as strings “meant to be converted to URI reference(s)”. Converted, in this case,
meaning mostly percent-encoding various characters not allowed in URIs.</para>
<para xml:id="p5">Historically, this was a necessary compromise with
<wikipedia>SGML</wikipedia>
where system identifiers
are just strings that the, uhm, system can use to identify an entity.
Given the intentionally open-ended definition of system identifiers in SGML,
there were bound to be legacy identifiers that
contained spaces and non-ASCII characters and all sorts of stuff.</para>
<para xml:id="p6">It was also done in recognition of the fact that human authors often use invalid
characters in identifiers. Consider the number of HTML documents that have spaces
in <tag class="attribute">href</tag> attributes. Users are used to browsers doing the
right thing and it was reasonable to make sure XML processors would do the same right thing.
</para>
</listitem>
<listitem>
<para xml:id="p7"><link xlink:href="http://www.w3.org/TR/xlink/">XLink 1.0</link> goes to
considerable trouble to define
<link xlink:href="http://www.w3.org/TR/xlink/#link-locators">special
processing</link> for <tag class="attribute">xlink:href</tag> attributes.
In this case, the analagy with <tag class="attribute">href</tag> attributes
in HTML is perfect.</para>
</listitem>
<listitem>
<para xml:id="p8"><link xlink:href="http://www.w3.org/TR/xmlbase/">XML Base</link>
copies the XLink text for
<link xlink:href="http://www.w3.org/TR/xmlbase/#escaping">encoding
and escaping</link> the <tag class="attribute">xml:base</tag> attribute value.
Again, for the same reasons.</para>
</listitem>
<listitem>
<para xml:id="p9"><link xlink:href="http://www.w3.org/TR/xmlschema-2/">XML Schema Part 2</link>,
in discussion of the lexical space of <code>xsd:anyURI</code> values,
<link xlink:href="http://www.w3.org/TR/xmlschema-2/#anyURI-lexical-representation">appeals
directly</link> to the XLink 1.0 text.</para>
</listitem>
<listitem>
<para xml:id="p10"><link xlink:href="http://www.w3.org/TR/xinclude/">XInclude</link> uses
<link xlink:href="http://www.w3.org/TR/xinclude/#IRIs">a reference to the XML 1.1</link>
processing to accomplish the same task for its <tag class="attribute">href</tag>
attribute.</para>
</listitem>
</itemizedlist>

<para xml:id="p11">(Those are just the specifications I could think of off the top
of my head that make reference to this special processing for “human
readable” resource identifiers; there may be others.)</para>

<para xml:id="p12">Many of these documents were written before, or while, the
<link xlink:href="http://www.ietf.org/rfc/rfc3987.txt">IRI specification</link> was
being written. When it came time to consider, yet again, the same text in the
context of
<link xlink:href="http://www.w3.org/TR/xlink/">XLink 1.1</link>, after IRIs were
defined, the fact that IRIs don't allow spaces meant we couldn't just
excise it all, we would have to craft it again.</para>

<para xml:id="p13">The fact that it's copied and referenced all over the place gave
us pause. For one thing, it meant we had to be extra careful. For
another, any sober reflection of the situation is bound to conclude that XLink is
just the wrong place for this text.</para>

<para xml:id="p14">Having specs totally unrelated to XLink pointing into it just
for a standard description of how to deal with invalid, but entirely expected,
characters in URI values doesn't make any sense.</para>

<para xml:id="p15">What we decided to do instead was attempt to publish
<link xlink:href="http://www.w3.org/XML/2007/04/hrri/">Human Readable Resource
Identifiers</link> (HRRIs) as an RFC.</para>

<para xml:id="p16">The text is short and straightforward and will likely be of
value outside of the XML context. And given that URIs and IRIs are
defined by RFCs, that seems like the right place for this text.</para>

<para xml:id="p17">The first
<link xlink:href="http://www.ietf.org/internet-drafts/draft-walsh-tobin-hrri-00.txt">Internet
Draft</link> of HRRIs has now been published.</para>

<para xml:id="p18">Comments most welcome and appreciated, of course. The best place to send them
is
<link xlink:href="mailto:www-xml-linking-comments@w3.org">www-xml-linking-comments@w3.org</link>.</para>

</essay>

