On the applicability of catalog resolution
As a strong proponent of XML Catalogs, I'm sometimes asked, “should catalog resolution be used for …?” The answer is “yes”.
The things we have most longed for do not happen; or if they do, it is never at the time nor under the circumstances when they could have made us happiest.
I'm a strong proponent of XML Catalogs . Not everyone believes in them, but I do.
As a consequence, I'm sometimes asked, “should catalog
resolution be used for” followed by some context in which a URI can
appear that an application might attempt to retrieve: the system and
public identifiers of documents and entities, the URI components of
schema
location hints in XML Schema,
the href
pseudo-attribute of the
XML stylesheet
processing instruction, the href
attributes
of include and import elements in XSLT, the URI in an
XSL
or CSS
property value, the href
attribute of
HTML, etc.
The answer is “yes”. Catalog resolution applies to any URI that a web cache applies to: all of them.
If you're writing an application, an entity resolver should be used for all external entities and a URI resolver should be used for all other URIs. All of them. Yes, that one. And that one. And that other one, too.
The XML specification gives explicit license to parsers to perform catalog resolution:
Attempts to retrieve the resource identified by a URI MAY be redirected at the parser level (for example, in an entity resolver) or below (at the protocol level, for example, via an HTTP Location: header).
To the extent that lack of such explicit license in other specifications leads users and developers to imagine that such resolution is not allowed, I think it was a mistake to call this out in the XML specification.
Your application doesn't need explicit license to use a resolver any more than it has to give me explicit license to run WWWOffle. GETing a URI? Resolve it first!
Comments
I've always been curious about the semantics (in terms of Web Architecture) of XML Catalog resolution of URIs. If an XML Catalog takes in one URI A and outputs a second URI B, what is the relationship between resource A and resource B? <A> owl:sameAs <B>? Or is the relationship at the discretion of the author of the XML Catalog under consideration?
The XML Catalogs specification does not take a position on what an alternate URI means, nor do I think it should.
While it is very often the case that the relationship is owl:sameAs, this is demonstrably not always the case. For example, on my system, all references to the DocBook 4.x DTDs are redirected to the current development version of V4. This is convenient for me (usually), but it could not reasonably be asserted that DocBook V4.1 was owl:sameAs DocBook V4.5b2.
When I've looked at XML Catalog resolution of URIs, it feels somewhat like RDF. When following an arc on an RDF graph (with no literals), you start at one URI and end at another. You can think of each step in XML Catalog resolution as one step away from a resource in the traversal of an RDF graph. Naturally, the next question is then, "why not use RDF?" Here are the features that I like about XML Catalogs for this functionality, as opposed to RDF. With XML Catalogs you get a natural functional behavior; with a graph, you would have to impose additional constraints on top of the graph model. Also, RDF syntax is unwieldy for this type of functionality. XML Catalog syntax is nice and straightforward for the behavior it provides.
If we are going to use XML Catalogs for general functional mapping of URIs, then I believe we need to correctly treat both inputs and outputs to XML catalogs as URIs (when performing URI resolution). As such, I want to take this opportunity to point out a comment that I made earlier this week, asking why the
uri
element does not specify that both the source (attributename
) and destination (attributeuri
) are not both of type uri-reference. (See also the section of the XML Catalog specification discussing theuri
element.) To me, this makes particular sense when you allow for resolution relationships besides owl:sameAs.