All your resolvers are belong to us

Volume 10, Issue 12; 14 Feb 2007; last modified 08 Oct 2010

Making resolvers easier for users.

In addition to the API proliferation problem I mentioned the other day, resolvers have another significant shortcoming: they're designed for programmers. That's a problem because, by and large, it isn't the programmers who need them, it's the legions of users out there wanting to make applications do the right thing.

This point was driven home quite forcefully the other day after an internal presentation I gave about the new resolver code. Santiago said, roughly, “that looks great, I'd love to use it” but when I began to explain how he could, he interrupted with “no, no, I don't want to have to set anything up, I don't want a properties file, it should just work.”

This is, in fact, manifestly the case. End users, the ones stuck with applications that need resolvers, can't usually tinker with the code involved.

We kicked around some ideas about how this might be accomplished. Eventually we came up with a plan and I have now implemented that plan. It's not a perfect plan because it does require a little bit of setup work and it does, in at least one case, break the rules.

On the other hand, it's a good plan because it should work with any Java application.

The plan

The plan is this: using the standard JAXP factory finder mechanisms, inject a special set of parser classes into the application. These classes behave exactly like the standard JAXP 1.4 parsers except that they always use a resolver.

It turns out that this works quite well. If you instantiate the resolver versions of the SAXParserFactory, XMLInputFactory, and DocumentBuilderFactory then:

The DOM LSResourceResolver will fallback to the XML catalog resolver code,
the StAX XMLResolver will fallback to the XML catalog resolver code,
the SAX EntityResolver2 will fallback to the XML catalog resolver code, and
the SAX EntityResolver will always use the XML catalog resolver code.

This is where the rules are broken. Because there's no way for the SAX EntityResolver to indicate that it didn't succeed, I never let the application specify one. Any attempt to do so is ignored and the XML catalog resolver is always used.
The NamespaceResolver isn't actually used by anyone yet, so there's no hook for it in the platform. Any application that uses it will be using the XML catalog resolver code, at least for now.

The catch

There is a catch. You knew there would be a catch, didn't you? The catch is that I constructed these specialty factories by stealing code from the JAXP 1.4 implementation. But I didn't want to steal all the code, so the result depends on JAXP 1.4.

In practice, this means that you'll need to be running at least Java 6 or using the standalone JAXP 1.4 code developed in the GlassFish Community.

Making it work

To try out this new feature, you do need to do a few things (sorry, Santiago, TANSTAAFL):

Make sure that you're using JAXP 1.4, either because you're using Java 6 or because you've got the JAXP 1.4 jar files in one of your java.endorsed.dirs.
Download both the XML Resolver and the XML Resolver JAXP 1.4 Factories from the XMLResolver project.
Put both of those jar files on your class path.

Run your Java application with the following system property settings:

javax.xml.parsers.SAXParserFactory=org.xmlresolver.sunjaxp.jaxp.SAXParserFactoryImpl
javax.xml.stream.XMLInputFactory=org.xmlresolver.sunxml.stream.XMLInputFactoryImpl
javax.xml.parsers.DocumentBuilderFactory=org.xmlresolver.sunjaxp.jaxp.DocumentBuilderFactoryImpl

I haven't setup a special factory for the Transformer yet, so you'll still have to tell your application which URI Resolver to use if you're using XSLT. (I might fix that as well, depending on just how many classes I have to copy to make that work.)

Naturally, I don't expect these alternate factories to have any consequences except improved resolver support. If you do have trouble, please let me know.

Other Changes

The other significant change I made was to enable caching by default. This means that you don't need to have a properties file, there are sensible defaults for all of the properties including the cache. The cache directory defaults to .xmlresolver/cache in your home directory. Of course, you can still override all the defaults with property settings.

If you only care about the updated resolver and don't want to try the factories trick, you only need the XML Resolver jar file; you can ignore the XML Resolver JAXP 1.4 Factories.

Debugging

The resolver uses the java.util.logging framework. By default, it only logs those few messages of priority “info” or higher to the console.

If you want more details, specify the verbosity property (either in a properties file or as a system property, xml.catalog.verbosity). A setting of “fine” or even “finer” will give much more information about what the resolver is doing.

Note that the Java console logger will only print messages of priority “info” or higher too, so if you want to see the finer messages, you also have to tell the logger to allow them through. I use

java.util.logging.config.file="/home/ndw/java/logging.properties"

Setting this information in two places is confusing and I'll probably unify it all into the logging framework for the next release.

It all works for me, but I've been hacking pretty hard. Feedback of any sort is always welcome.

Comments

This is very cool, Norm. I haven't give it try yet, but I must do.

I'm not Java expert, but wouldn't it be possible to store Java properties which change parser factories inside .jar file, so setup will be just about putting XML Resolver and Resolver factories JARs into classpath? That way installation would be even less difficult.

I was going to try this out, but I couldn't see how to specify where the catalog file resides. Do I still use a CatalogManager.properties file for such?

Yes, Bob. Documentation, such as it is, is in the JavaDoc.

Thanks for making this code available. I'm hoping it will allow me to finally fulfill the following fantasy. For some reason I've got it in my head that this is "the way things are supposed to work", but despite all my digging I can't seem to find anybody who's actually doing this. It goes:

1. I have a W3C Schema document myschema.xsd. It starts off like:

  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://my.tld/something" targetNamespace="http://my.tld/something" elementFormDefault="qualified" attributeFormDefault="unqualified"
  version="1.0" xml:lang="EN">

2. I have an instance document mydocument.xml. It starts off like:

  <something xmlns="http://my.tld/something">

3. I have a catalog file. It says that whenever somebody finds a document/element that says it belongs to namespace "http://my.tld/something" and wants to validate it, it should retrieve the schema from myschema.xsd.

4. I can give this to somebody else and they can move the schema document to wherever they want, and just edit the catalog entry.

5. There doesn't have to be anything at http://my.tld/something. It doesn't even have to exist.

6. I don't have to use xsi:schemaLocation location hints in my instance documents. The namespace URI is enough to trigger all of this.

7. My application doesn't have to have any special knowledge of the namespaces that will be used. I want to potentially be able to validate content that belongs to namespaces I don't know anything about.

This is the whole idea of catalogs, right? But I just can't seem to get it to work, either with the older XML resolver stuff or this new code. I've been trying with Xerces's DOM package, with JDOM, with Xerces SAX, with Xerces LSParser....just always running into trouble. Generally it seems that the validation isn't getting triggered -- the catalog gets read, but the association between the namespace URI and the schema location just doesn't take. I'm using the <uri/> elements with the RDDL annotations noted above.

I'm willing to help out with finding bugs, but I'd just like to know whether I'm totally off the mark with my understanding of how things are intended to work.

Thanks!

I've sent you a patch that makes the namespace resolver stuff get used in my scenario. Thanks -- this is great!

I'll take a look at your patch, but the problem I see with your scenario is that (to date) none of the APIs out there make use of the RDDL information.

That is, the W3C XML Schema validator just asks for the namespace URI. It doesn't ask for the namespace URI for the purpose of validation. If it did, then everything would work just as you suggest.

That's what the resolveNamespace API is for.

I also expected things to work as Noel Bush describes above and it has been difficult for me to recognize and accept that they _don't_ work that way. Only today did I google RDDL to find this explanation that corrects one of my misunderstandings. Still, AFAIK, the schemaLocation and noNamespaceSchemaLocation attributes from http://www.w3.org/2001/XMLSchema-instance associate a uri with a url; it seems logical that the associated url might go through the catalog resolver; another fantasy I suppose. For now its back to work for me...