<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
<title>Building a better resolver</title><biblioid class="uri">http://norman.walsh.name/2007/02/06/xmlresolver</biblioid>
<volumenum>10</volumenum>
<issuenum>8</issuenum>
<pubdate>2007-02-06T09:17:31-05:00</pubdate>
<date>$Date: 2007-02-06 19:00:03 -0500 (Tue, 06 Feb 2007) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2007</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>I've been working on a reimplementation of my XML
Catalog-based entity/URI resolver. It has a more sensible design, includes a
caching feature, and supports a new API for dealing with XML Namespace
names.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#GlassFish"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Java"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XMLCatalogs"/>
</info>

<para xml:id="p1">The first substantial body of Java code released
into the wild with my name on it was the entity resolver code that
eventually made its way into the
<link xlink:href="http://xml.apache.org/commons/">Apache XML Commons</link>
project.</para>

<para xml:id="p2">The origins of that code stretch back at least six years, maybe
closer to ten. Its design seems…“odd”, at best, from a modern
perspective but it's been in use for a long time and a lot of people
use it every day. Some of the features that I implemented in that code
eventually made it into the
<link xlink:href="http://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html">XML Catalogs V1.1</link>
Standard.</para>

<para xml:id="p3">One of the complaints raised against the catalog-based approach
to URI management is that the end user has to write and maintain the
catalog. On some systems, the catalog is automatically updated for
packages that are installed locally, but that doesn't address the
issue of random web resources accessed by the user.</para>

<para xml:id="p4">If the resolver doesn't find an entry in the catalog for the
resource requested, it goes out to the web and fetches it, so there's
a straight-forward and obvious way to attack the manual maintenance
issue: have the resolver cache the resources that it fetches.
I've been meaning to implement caching for years.</para>

<para xml:id="p5">Another
feature that occurred to me more recently is improved support for XML
Namespace names. The <wikipedia>RDDL</wikipedia>
approach of assigning a nature and purpose
to a Namespace URI can easily be implemented as an XML Catalog extension.
</para>

<para xml:id="p6">A few weeks ago, I set out to refactor the resolver and add these
features. The first fruit of that effort is now available at
<link xlink:href="http://xmlresolver.dev.java.net/"/>.</para>

<para xml:id="p7">Feature-wise, the new resolver:</para>

<itemizedlist>
<listitem>
      <para xml:id="p8">Is backwards compatible with the existing catalog resolver
in Apache XML Commons.
</para>
    </listitem>
<listitem>
      <para xml:id="p9">Supports automatic caching of resources retrieved from web
eliminating the need for manual catalog maintenance.
</para>
    </listitem>
</itemizedlist>

<para xml:id="p10">Implementation-wise, the new resolver:</para>

<itemizedlist>
<listitem>
      <para xml:id="p11">Abandons the complex internal data structures used to
represent catalogs. Each catalog is simply loaded as a DOM. This
greatly simplifies the code and makes implementing extensions much
more practical.
</para>
    </listitem>
<listitem>
      <para xml:id="p12">Uses the
<package xlink:href="http://java.sun.com/javase/6/docs/api/java/util/logging/package-summary.html">java.util.logging</package> framework instead of a home-grown logging
class.
</para>
    </listitem>
<listitem>
      <para xml:id="p13">Supports only OASIS XML Catalogs. It wouldn't be
impossible to add support for other catalog formats, but I don't have
any immediate plans to do so.
</para>
    </listitem>
<listitem>
      <para xml:id="p14">Has a more sensible design with three levels of
catalog resolution: a simple, string-based lookup service that
interrogates the catalog and determines what mapping, if any, the
catalog specifies; a resolver that returns ordinary Java <code>InputStreams</code>; and a
resolver that returns XML <code>Source</code> and
<code>InputSource</code> objects.
</para>
    </listitem>
<listitem>
      <para xml:id="p15">Supports a new
<interfacename xlink:href="#resolveNamespace">NamespaceResolver</interfacename> interface
to retrieve resources associated with XML Namespace names.
</para>
    </listitem>
<listitem>
      <para xml:id="p16">Represents web resources (and catalog results) with a URI,
a MIME type, and a content body.
</para>
    </listitem>
<listitem>
      <para xml:id="p17">Is thread-safe so that a single resolver instance can
be shared across an entire application.
</para>
    </listitem>
<listitem>
      <para xml:id="p18">Uses file-based locking to assure that a single cache can be
shared across an entire application or even multiple applications possibly running
in different VMs.
</para>
    </listitem>
<listitem>
      <para xml:id="p19">Includes a (not quite complete) set of unit tests for catalog
lookup results.
</para>
    </listitem>
</itemizedlist>

<section xml:id="install">
<title>Using the resolver</title>

<para xml:id="p20">If you've never used a resolver before, simply put the
<filename>xmlresolver.jar</filename> file on your
<envar>CLASSPATH</envar>, instantiate a
<code xlink:href="https://xmlresolver.dev.java.net/nonav/javadoc/org/xmlresolver/Resolver.html">org.xmlresolver.Resolver</code>
object, and use it as the
<link xlink:href="http://java.sun.com/javase/6/docs/api/org/xml/sax/EntityResolver.html">entity resolver</link> or
<link xlink:href="http://java.sun.com/javase/6/docs/api/javax/xml/transform/URIResolver.html">URI resolver</link>
in your application.</para>

<para xml:id="p21">As a convenience, you can simply instantiate a
<code xlink:href="https://xmlresolver.dev.java.net/nonav/javadoc/org/xmlresolver/tools/ResolvingXMLReader.html">org.xmlresolver.tools.ResolvingXMLReader</code>.
That
implementation of an
<classname xlink:href="http://java.sun.com/javase/6/docs/api/org/xml/sax/XMLReader.html">XMLReader</classname> will automatically
use the resolver.</para>

<para xml:id="p22">(The next release will probably include more convenience
features including some code to plug into the standard
<wikipedia>JAXP</wikipedia> factory mechanism making it trivial to add
the resolver to <emphasis>all</emphasis> parsers used by any application.)</para>

</section>

<section xml:id="upgrading">
<title>Upgrading to the new code</title>

<para xml:id="p23">If you've been using the XML Commons resolver in your
application, the new code is designed to be backwards compatible.
Simply put the <filename>xmlresolver.jar</filename> file on your
<envar>CLASSPATH</envar> and use
<code>org.xmlresolver.<replaceable>*</replaceable></code> instead of
<code>org.apache.xml.resolver.<replaceable>*</replaceable></code>.
</para>

<para xml:id="p24">For example, if you've been running:</para>

<programlisting>java … com.saxonica.Transform \
        -x org.apache.xml.resolver.tools.ResolvingXMLReader \
        -y org.apache.xml.resolver.tools.ResolvingXMLReader \
        -r org.apache.xml.resolver.Resolver \
        …</programlisting>

<para xml:id="p25">You can run this instead:</para>

<programlisting>java … com.saxonica.Transform \
        -x org.xmlresolver.tools.ResolvingXMLReader \
        -y org.xmlresolver.tools.ResolvingXMLReader \
        -r org.xmlresolver.Resolver \
        …</programlisting>

<para xml:id="p26">Other Java tools may have similar options.</para>
</section>

<section xml:id="caching">
<title>Enabling the cache</title>

<para xml:id="p27">In order to use the new caching feature, you have to explicitly
enable it. Caching requires write-access to a cache directory which you
must identify through
<link xlink:href="https://xmlresolver.dev.java.net/nonav/javadoc/org/xmlresolver/Catalog.html">a
Catalog property</link>. Note that this directory should be under the
exclusive control of the resolver.</para>

<para xml:id="p28">The format of the caching control file is described briefly
<link xlink:href="https://xmlresolver.dev.java.net/nonav/javadoc/org/xmlresolver/ResourceCache.html">in
the JavaDoc</link>.</para>
</section>

<section xml:id="resolveNamespace">
<title>Resolving XML Namespaces</title>

<para xml:id="p29">The resolver proposes a new interface,
<interfacename xlink:href="https://xmlresolver.dev.java.net/nonav/javadoc/org/xmlresolver/NamespaceResolver.html">NamespaceResolver</interfacename>
with a single method,
<methodname>resolveNamespace</methodname>. The method takes three
parameters: an absolute Namespace URI, a nature, and a purpose. The
method returns a resource associated with the namespace URI that has
the specified nature and purpose.
If no matching resource can be found, the document at the namespace
URI is returned.</para>

<para xml:id="p30">The catalog can identify the nature and purpose of a URI with extension
attributes:</para>

<programlisting>&lt;uri xmlns:r="http://www.rddl.org/"
     name="http://www.w3.org/2001/XMLSchema"
     r:nature="http://www.w3.org/2001/XMLSchema"
     r:purpose="http://www.rddl.org/purposes#schema-validation"
     uri="/cache/xrc1234.xsd"/&gt;</programlisting>

<para xml:id="p31">If there isn't a match, the resolver attempts to parse the
namespace document as a RDDL document (1.0 for the moment, though I
plan to support more) and find the match that way.</para>
</section>

<section xml:id="disclaimer">
<title>Disclaimer</title>

<para xml:id="p32">I've been running this code for a week or two “in production” on
my laptop. It seems to work for me, but I wouldn't put it into
production use anywhere else without careful consideration. It's quite
likely that some of the work to make it thread-safe is incomplete.
It's not documented very well yet. In short: it's beta. Your milage
may vary. It may not work. It may work badly. It's not my fault.</para>

<para xml:id="p33">Share and enjoy.</para>
</section>
</essay>

