Human Readable Resource Identifiers

Volume 10, Issue 40; 30 Apr 2007; last modified 08 Oct 2010

Dealing with the things that you type that look mostly like URIs but aren't.

It's a curious thing about our industry: not only do we not learn from our mistakes, we also don't learn from our successes.

—Keith Braithwaite

There are lots of places where we expect authors to type URI values. Left to their own devices, authors type these identifiers in a “human readable” form; that is, they may contain spaces, punctuation characters, non-ASCII text, etc.

Consider the current state of play in the XML specifications:

Although we think of, and casually describe, XML system identifiers as URIs (or, more accurately, IRIs), both XML 1.0 and XML 1.1 describe system identifiers as strings “meant to be converted to URI reference(s)”. Converted, in this case, meaning mostly percent-encoding various characters not allowed in URIs.

Historically, this was a necessary compromise with SGML where system identifiers are just strings that the, uhm, system can use to identify an entity. Given the intentionally open-ended definition of system identifiers in SGML, there were bound to be legacy identifiers that contained spaces and non-ASCII characters and all sorts of stuff.

It was also done in recognition of the fact that human authors often use invalid characters in identifiers. Consider the number of HTML documents that have spaces in href attributes. Users are used to browsers doing the right thing and it was reasonable to make sure XML processors would do the same right thing.
XLink 1.0 goes to considerable trouble to define special processing for xlink:href attributes. In this case, the analagy with href attributes in HTML is perfect.
XML Base copies the XLink text for encoding and escaping the xml:base attribute value. Again, for the same reasons.
XML Schema Part 2, in discussion of the lexical space of xsd:anyURI values, appeals directly to the XLink 1.0 text.
XInclude uses a reference to the XML 1.1 processing to accomplish the same task for its href attribute.

(Those are just the specifications I could think of off the top of my head that make reference to this special processing for “human readable” resource identifiers; there may be others.)

Many of these documents were written before, or while, the IRI specification was being written. When it came time to consider, yet again, the same text in the context of XLink 1.1, after IRIs were defined, the fact that IRIs don't allow spaces meant we couldn't just excise it all, we would have to craft it again.

The fact that it's copied and referenced all over the place gave us pause. For one thing, it meant we had to be extra careful. For another, any sober reflection of the situation is bound to conclude that XLink is just the wrong place for this text.

Having specs totally unrelated to XLink pointing into it just for a standard description of how to deal with invalid, but entirely expected, characters in URI values doesn't make any sense.

What we decided to do instead was attempt to publish Human Readable Resource Identifiers (HRRIs) as an RFC.

The text is short and straightforward and will likely be of value outside of the XML context. And given that URIs and IRIs are defined by RFCs, that seems like the right place for this text.

The first Internet Draft of HRRIs has now been published.

Comments most welcome and appreciated, of course. The best place to send them is www-xml-linking-comments@w3.org.

Comments

Oh no, not another *R? syntax! Seriously, it's a great pity that these, worthwhile, extensions didn't get included in IRIs. I hope everybody has a really good think about whether there's anything else that needs to go in before this becomes an RFC.

Yes! In fact we have recently implemented a similar set of rules in our XForms engine. Now we would have a "spec" to go by. The use case has to do with the XForms submission construct which can be used to send an HTTP request. This can be used to send some XQuery to eXist on the URI. The URI is provided in the action attribute of xforms:submission and it be convenient to write something like:

action="/exist/rest/db/mycollection?_query=element count { count(/*) }"

This would not be a valid URI, but it is a valid HRRI which can be converted to a URI following the set of rules you and Richard proposed. Is this in line with the use cases you have in mind?

Alex