<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'>
<info>
<title>SXPipe: Simple XML Pipelines</title>
<volumenum>7</volumenum>
<issuenum>103</issuenum>
<pubdate>2004-06-20T16:50:00-04:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2004</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>SXPipe is a language for building Simple XML Pipelines and a
Java toolkit that implements it. This is hardly a new idea; a quick
web search will turn up a number of similar projects. I’ve written
elsewhere about why I did it and why I think pipelines are important.
This essay just describes SXPipe.</para>
</abstract>
</info>

<para xml:id="id1">SXPipe is a language for building Simple XML Pipelines and a
Java toolkit that implements it.
This is hardly a new idea; a quick web search will turn up a number
of similar projects. I’ve written elsewhere about
<link xlink:href="xmlactivity">why I did it</link> and why I think
<link xlink:href="pipelines">pipelines are important</link>. This essay
just describes SXPipe.</para>

<para xml:id="id2">SXPipe loads a document, subjects it to a number of processing
stages, and (usually) writes out the result. Along the way, stages may 
load additional documents, but the essential model is that a pipeline
functions as a simple linear sequence of operations over an Infoset.
(Pragmatically, the Infoset is modelled with a Document object from the
W3C Document Object Model.)
</para>

<para xml:id="id3">A few words about what SXPipe isn’t:</para>

<orderedlist>
<listitem>
<para xml:id="id4">SXPipe isn’t implemented as a series of SAX Filters, instead
the stages of the pipeline operate by passing Infosets along.</para>
<para xml:id="id5">I don’t think there’s anything intrinsicly better (or worse)
about this strategy than using SAX Filters, but it <emphasis>feels</emphasis>
a little different to me and it makes the stages very simple.</para>
<para xml:id="id6">There are probably good arguments in favor of the SAX approach;
certainly a good, streaming pipeline implementation will be able to
begin producing output faster and might require a smaller memory
footprint, but neither of those things is particularly important to
me.</para>
</listitem>
<listitem>
<para xml:id="id7">SXPipe isn’t part of a larger framework. It runs from the
command line and stands by itself: no web servers, no servlets, no
containers, no content management infrastructure. It’s just a
pipeline.</para>
<para xml:id="id8">There’s nothing to stop you embedding it in another application,
but that’s not how it works now.</para>
</listitem>
<listitem>
<para xml:id="id9">SXPipe doesn’t have a complex expression language. It has one
very primitive conditionality feature (maybe one too many). You can’t
write loops, or track dependencies, or directly instantiate complex
nested transformations. It’s just a pipeline.</para>
</listitem>
</orderedlist>

<para xml:id="id10">What’s it good for? It’s good for reasonably straightforward
pipelines like this one:</para>

<programlisting><![CDATA[<pipeline>
<stage process="XInclude"/>
<stage process="Transform" stylesheet="profile.xsl"/>
<stage process="Validate" schema="schema.rng"/>
<stage process="Transform" stylesheet="doc.xsl"/>
</pipeline>]]></programlisting>

<para xml:id="id11">It is explicitly a lot simpler than shell scripts,
<application>make</application> files, or
<application>ant</application> build scripts. Running it
requires nothing more complex than the jar file that contains the
classes:</para>

<programlisting>java Pipeline pipe.xml &lt; input.xml &gt; output.xml</programlisting>

<para xml:id="id12">Where <filename>pipe.xml</filename> contains your pipeline file, like
the one shown above, and <filename>input.xml</filename> and
<filename>output.xml</filename> are your input and output, respectively.</para>

<section xml:id="language">
<title>The Language</title>

<para xml:id="id13">The language consists of four elements: <tag>pipeline</tag>,
<tag>param</tag>, <tag>stage</tag>, and <tag>choose</tag>.
The <tag>pipeline</tag> element is just the document element,
<tag>param</tag> lets you set some simple parameters, and
<tag>stage</tag> and <tag>choose</tag> do all the actual work.</para>

<para xml:id="id14">The one conditionality feature is that each stage has an optional
<tag class="attribute">skip</tag> attribute. If 
<tag class="attribute">skip</tag> is “yes”, then the stage is ignored.
The <tag>choose</tag> element lets you make sure that exactly one
of a list of stages is executed: the first one that isn’t skipped.</para>

<para xml:id="id15">Here’s a slightly more complicated example:</para>

<programlisting><![CDATA[<pipeline>
  <param name="draft" value="no"/>

  <stage skip="${draft}" process="XInclude"/>
  <choose>
    <stage skip="${draft}"
           process="Transform"
           stylesheet="profile.xsl"/>
    <stage process="Transform"
           stylesheet="strip.xsl"/>
  </choose>
  <stage skip="${draft}" process="Validate" schema="schema.rng"/>
  <stage process="Transform" stylesheet="doc.xsl"/>
</pipeline>]]></programlisting>

<para xml:id="id16">If the <parameter>draft</parameter> parameter is “no”, this pipeline
will perform XInclude, then Transform with the <filename>profile.xsl</filename>
stylesheet (which fulfills the <tag>choose</tag>),
then Validate, then Transform with the <filename>doc.xsl</filename>
stylesheet.</para>

<para xml:id="id17">If the <parameter>draft</parameter> parameter is “yes”, which could
be specified on the command line, XInclude will be skipped and so will
the profiling, but the Transform with <filename>strip.xsl</filename> will be
performed this time, then the Transform with the <filename>doc.xsl</filename>
stylesheet (because the Validate will also be skipped).</para>

<para xml:id="id18">There’s a little more detail in the JavaDocs, but clearly I should write
a real spec. (Yeah, the irony is plain to me, thanks for asking.)</para>

</section>

<section xml:id="implementation">
<title>The Implementation</title>

<para xml:id="id19">I’ve coded up an implementation in Java. I’m still in the
process of setting up a home for it, so I don’t have pointers to the sources yet. I
expect that will resolve itself fairly quickly.</para>

<para xml:id="id20">The implementation is built on top of
<link xlink:href="http://weblogs.java.net/pub/wlg/1427">Java 1.5.0</link>
because (a) 1.5 is really cool, (b) 1.5 includes JAXP 1.3 out of the box,
and (c) well, it’s good for my career to be testing the latest releases,
right :-).</para>

<para xml:id="id21">In practice, I haven’t started using any of the
<link xlink:href="http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html">cool
new Java 1.5 features</link> like generics, metadata, typesafe enumerations,
and autoboxing. But I don’t promise not to, at least not after Java 1.5.0
has officially shipped.
Until then, it should run under Java 1.3 or 1.4. You will need JAXP 1.3 though.</para>

<para xml:id="id22">Out of the box, SXPipe implements six stages: reading, writing,
XInclude processing, XSLT transformation, validation, and a no-op
identity stage. I’ll probably code up an XSL FO processor stage at some point,
and of course, you can write your own.</para>

<para xml:id="id23">The <code>PipelineStage</code> interface is nothing more than:</para>

<programlisting>
public interface PipelineStage {
  /**
   * &lt;p>Initializes the pipeline stage.&lt;/p>
   *
   * @param config The PipelineConfiguration used by this
   * pipeline.
   * @param stage The &lt;code>stage&lt;/code> element that is
   * being processed.
   * @throws PipelineException If there is something wrong.
   */
  public void init(PipelineConfiguration config,
                   Element stage) throws PipelineException;

  /**
   * &lt;p>Run the stage.&lt;/p>
   *
   * @param input The input DOM.
   * @throws PipelineException If there is something wrong.
   * For example, if the attempt to load a schema needed for
   * validation failed.
   * @throws StageFailedException If the stage executed
   * properly but was unsuccessful. For example, if the
   * stage was able to validate the document but the
   * document was not valid.
   * @return The output DOM.
   */
  public Document run(Document input)
     throws PipelineException, StageFailedException;
}
</programlisting>

<para xml:id="id24">Finally, SXPipe is the result of a few late nights of
<link xlink:href="xmlactivity#anger">coding in anger</link>.
If it’s never good for anything else, it was good for my soul.</para>

</section>
</essay>
