<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       xmlns:foaf="http://xmlns.com/foaf/0.1/"
       xml:lang="en"
       version='5.0'>
<info>
<title>Tread lightly</title>
<volumenum>10</volumenum>
<issuenum>89</issuenum>
<pubdate>2007-09-07T11:24:12-04:00</pubdate>
<date>$Date$</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2007</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Take advantage of the catalog resolver built into GlassFish to
treat your neighbors more gently and maybe improve performance.</para>
</abstract>
</info>

<para xml:id='p1'>By its very nature, the web encourages dereference of
URIs. This is a good thing, it's how we surf the web in our browser of
choice and it's how web applications take advantage of distributed
resources.</para>

<para xml:id='p2'>The more popular a resource is, the more likely it is to get
dereferenced. This too, is usually a good thing. Lots of folks keep track
of the number of “hits” they get (for weblog postings,
press releases, product downloads, etc.). More is better.</para>

<para xml:id='p3'>But it is possible to get too much of a good thing, especially
where web applications are concerned. A popular web application can
hit a resource thousands of times an hour (maybe more), faster than even the most
caffeine-fueled web surfer.</para>

<para xml:id='p4'>The <wikipedia page="World_Wide_Web_Consortium">W3C</wikipedia>,
for example, gets an astonishing number of hits for DTDs
(especially the HTML and XHTML DTDs), schemas, and namespace
documents. So many, in fact, that sometimes it looks like a <wikipedia
page="Denial-of-service_attack">denial-of-service attack</wikipedia>.
And sometimes that'll get you locked out completely for several
days.</para>

<para xml:id='p5'>Addressing the problem of scalable access to web resources is
not a simple one. There are a number of ways it can be approached at a
number of different levels in the web architecture stack. The W3C
<wikipedia>Technical Architecture Group</wikipedia> has agreed to
investigate the issue.</para>

<para xml:id='p6'>In the meantime, if you're writing <wikipedia>GlassFish</wikipedia>
servlets or other applications that are performing XML processing, you
can take advantage of the
<link xlink:href="http://www.oasis-open.org/committees/download.php/14809/xml-catalogs.html">XML
Catalog<alt>XML Catalogs OASIS Standard V1.1, 7 October 2005</alt></link>
resolver built into GlassFish to
directly reduce the burden your applications are creating.
(Never heard of a XML Catalogs? I
<link xlink:href="/2003/06/05/xmlcatalogs">wrote some background information</link>
a while back.)</para>

<para xml:id='p7'>The secret is to make local copies of static resources and then
tell the catalog resolver to use them instead. I'm going to use the
XHTML DTD for my example, but it applies to any web resource that your
application might be accessing.</para>

<para xml:id='p8'>The first step is to make local copies of the representations you
need and then create an XML Catalog for them. In this case, I grabbed
the
<link xlink:href="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">xhtml1-transitional.dtd</link> file and the entities it relies on and built this
catalog:</para>

<programlisting><![CDATA[<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
         prefer="public">

  <public publicId="-//W3C//DTD XHTML 1.0 Transitional//EN"
	  uri="xhtml1-transitional.dtd"/>

  <public publicId="-//W3C//ENTITIES Latin 1" uri="xhtml-lat1.ent"/>
  <public publicId="-//W3C//ENTITIES Symbols" uri="xhtml-symbol.ent"/>
  <public publicId="-//W3C//ENTITIES Special" uri="xhtml-special.ent"/>
</catalog>]]></programlisting>

<para xml:id='p9'>The next step relies on the fact that GlassFish ships with the
entity resolver that was developed as part of the Apache XML Commons
project. All you have to do is make sure that it's used as the
entity resolver or URI resolver by your application.</para>

<para xml:id='p10'>In the interest of choosing a simple, common example, let's look
at how we can use this resolver if we're parsing a document with
SAX.</para>

<para xml:id='p11'>First, make sure that you're importing the resolver:</para>

<programlisting>import com.sun.org.apache.xml.internal.resolver.tools.CatalogResolver;
</programlisting>

<para xml:id='p12'>I know this ships with GlassFish, but if you're uncomfortable with the
slightly dubious practice of relying on “private” classes, you can install
the <link xlink:href="http://xml.apache.org/commons/">standard distribution</link>
yourself.</para>

<para xml:id='p13'>(And if you're using <wikipedia>NetBeans</wikipedia>, you can
skip right over the import step, NetBeans will suggest the import for
you when you need it and stick it in the right place.)</para>

<para xml:id='p14'>Next, make yourself a catalog resolver:</para>

<programlisting>CatalogResolver resolver = new CatalogResolver();</programlisting>

<para xml:id='p15'>And finally, make sure it gets used. I setup my own SAX handler
and used it in there:</para>

<programlisting>private class MyHandler extends DefaultHandler {
    public InputSource resolveEntity(String publicId, String systemId) throws IOException, SAXException {
        return resolver.resolveEntity(publicId, systemId);
    }
}</programlisting>

<para xml:id='p16'>The complete class file
<link xlink:href="examples/ResponseServlet.java">is available</link>
if you want to see all the code. I hacked the “hello2” example from
Chapter 2 of
<link xlink:href="http://java.sun.com/javaee/5/docs/tutorial/doc/">The
Java EE 5 Tutorial</link>. It treats the “name” you give it as a URI
and attempts to parse it.</para>

<para xml:id='p17'>In the GlassFish context, there's one more step. You have to configure
the server so that it sets the <literal>xml.catalog.files</literal> system
property to point to your catalog. (There are other ways of getting to
the right catalog, but this is the simplest.)</para>

<para xml:id='p18'>I added the system property to the domain configuration file:</para>

<programlisting><![CDATA[<system-property name="xml.catalog.files" value="file:///tmp/catalog.xml"/>]]></programlisting>

<para xml:id='p19'>Of course, <filename>/tmp/</filename> is a silly place to put
the file, but it was enough for this demonstration.</para>

<para xml:id='p20'>Not only do you get the benefit of being a better net citizen by
using a resolver to reduce your burden on your net neighbors, but you
may see a performance improvement as well. No matter how good your
server bandwidth is, it's still slower to hit the net than your local
file system.</para>

<para xml:id='p21'>Next time, we'll look at using some slightly more
<link xlink:href="/2007/02/06/xmlresolver">bleeding edge</link> code
to avoid the task of constructing the catalog by hand.</para>

</essay>
