Caching in with Resolvers

Volume 6, Issue 31; 05 Jun 2003

XML Catalogs is now a Committee Specification. We're well on our way to OASIS Standard, I think, and that means it's time to get your deployment strategies in order.

Everything in the universe goes by indirection. There are no straight lines.


On Tuesday afternoon, moments before I darted out of town for a couple of days, the we voted to take the current XML Catalogs draft to Committee Specification. This is almost the penultimate step to become an OASIS Standard (voting membership willing, of course). If you've been using catalogs, that means it's time to get your deployment strategies in order. If you haven't, that means it's high time you did!

XML Catalogs is the principle work product of the OASIS Entity Resolution Technical Committee, chaired by the inimitable Lauren Wood, and for which I am the humble editor.

The “elevator speech” for catalogs goes like this: all sorts of critical resources are identified by URI these daysThat's a good thing! I wish we had public identifiers for them as well; but that's a topic for another essay. (schemas, stylesheets, DTDs, RDF grammars, etc.). As long as you're connected to the net, everything works perfectly. But what happens when you're disconnected, either because the part of the net is down or because you've unplugged your laptop and taken it to 30,000 feet for some transcontinental journey? Suddenly, the fact that you need to get your work done is a frustrating complication. The problem often isn't that you don't have the relevant document locally, the problem is that your system identifers, schema location attributes, and what have you all point to the web. Sure, you could change them all to point to the local copies, but then it's a pain to share documents with colleagues and you have to do this hacking over and over again. What you need is a transparent way of remapping those identifiers to the appropriate local copies. And that's exactly what XML Catalogs give you: a flexible, transparent mapping of resources.

And the really good news: XML Catalogs are already widely supported. For example, almost all Java tools can use the resolver classes from the ApacheXML Commons project and all Gnomelibxml”-based tools support them.

Let's look at two common scenarios:

Resolving Standard Resources

Suppose you have a few hundred DocBook documents lying around. You'd never dream of working with documents that weren't validated (you wouldn't, would you!?), so they all begin with the relevant document type declaration:

<!DOCTYPE book
  PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"

Everytime you parse one of these documents, the parser goes out to the web and drags the whole DocBook DTD down from the OASIS site. Even with a fast connection, that probably takes a few seconds. And like I said, it only works at all if you can actually get to the OASIS web site.

All that downloading is probably entirely unnecessary. If you're editing that many DocBook documents, it's likely that you've got a copy of the DTD on your machine. For the sake of argument, let's say it's stored in /docbook/4.2/.

That means if you setup the following catalogIn practice, you probably want to map the system identifier as well, and maybe a few other things, like the entity sets, but for the sake of simplicity (and to keep the code listing narrow) I'm omitting those for the time being.:


  <public publicId="-//OASIS//DTD DocBook XML V4.2//EN"

and point your applications at it, all those net accesses for the DocBook DTD will transparently vanish. Everything will run faster, and it'll run just as well at 30,000 feet as it will plugged into the net on your desk.

Resolving A Development Resource

Catalogs can be really convenient not only for stable resources, but for development resources as well. Here's an example that I use everyday.

I work on a large set of stylesheets for DocBook. I extend and customize the base stylesheets with fair regularity. The XML Catalogs specification, for example, is generated with a customization on top of the base DocBook stylesheets.

That means I have a lot of stylesheets that begin like this:




They start by importing the base DocBook stylesheets from their public location (though I've abbreviated the URI in the example). The problem is, when I'm fixing bugs or doing development work, I don't really want to get the public version, I want to get my local development version.

I could change all the href attributes to point to the local copies, but that'd be a maintainance nightmare. If they pointed to local copies, they wouldn't work for anyone but me, so I'd have to remember to make them all point to the public location before each release.

The solution is a “development catalog”:


  <uri name=""

This catalog maps the public URI to my local development copy. By using that catalog, I get to pretend that I've published my local version. (Every access to the public version is transparently mapped to my local development copy.)

Catalogs solve problems for me everyday. They can solve problems for you too.