<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
    
    
    
    
<title>Why Pipelines?</title><biblioid class="uri">http://norman.walsh.name/2004/06/20/pipelines</biblioid>
<volumenum>7</volumenum>
<issuenum>102</issuenum>
<pubdate>2004-06-20T16:40:00-04:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2004</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>If your experience with XML documents is limited to XHTML pages
and SOAP-mediated RPC, the notion that one might want an XML Pipeline
Language may seem a bit far fetched. What, you might ask, is your
problem?</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<para xml:id="id1">If your experience with XML documents is limited to XHTML pages
and SOAP-mediated RPC, the notion that one might want an XML Pipeline
Language may seem a bit far fetched. What, you might ask, is your
problem?</para>

<para xml:id="id2">In a nutshell, my problem is that my documents are rarely
authored in exactly the format that they’re going to be used. That
means that I need to perform some processing on them before they’re in
their “final form.”</para>

<para xml:id="id3">In the very simplest case, it’s just a straight transformation
from XML to HTML (or PDF or what-have-you)</para>

<gal:photo rdf:resource="images/transform"/>

<para xml:id="id4">but the picture can quickly get more complicated. Using XInclude
is perhaps the canonical example. Suppose that I’ve broken my document
into different files. I might do this to make
collaboration easier or simply for editorial convenience. I can
(possibly) validate the individual files, but that won’t tell me the
whole story. At the very least, it won’t let me check the actual
structure of the whole document and it won’t allow me to check
referential constraints such as ID/IDREF.</para>

<para xml:id="id5">In order to process this multi-file document, I need to process
the XInclude statements, <emphasis>then</emphasis> validate the result,
<emphasis>then</emphasis> transform it:</para>

<gal:photo rdf:resource="images/xinclude"/>

<para xml:id="id6">Here’s one last, real world example in a little more detail.
Document reuse is one of the oft-cited benefits of XML and lots of
people author to take advantage of it. Usually this means adding at
least the occasional “profiling” attribute to a document. Large
parts may be common to two systems, for example, but there are still
places where system-specific text has to be added:</para>

<para xml:id="id7">Consider the following paragraph about command line parameters to
an XSLT processor. By setting the desired vendor
condition appropriately, you can render it for either <application>Saxon</application>
or <application>Xalan</application>:</para>

<programlisting>
&lt;para xml:id="id8"&gt;You can pass stylesheet parameters to
&amp;xslt; on the command line. The syntax is
&lt;phrase vendor="saxonica"&gt;name=value&lt;/phrase&gt;
&lt;phrase vendor="apache"&gt;-PARAM name value&lt;/phrase&gt;
where &lt;replaceable&gt;name&lt;/replaceable&gt; is the name
 of the parameter and &lt;replaceable&gt;value&lt;/replaceable&gt;
is the string value you wish to establish as
its value.&lt;/para&gt;</programlisting>

<para xml:id="id9">Processing this document involves a two-stage transformation, first to
build the “profiled” source file and then to build the result. Adding the
names of the external files used in this process, our augmented XInclude and
Profile pipeline looks like this:</para>

<gal:photo rdf:resource="images/profile"/>

<para xml:id="id10">So building a finished document involves a series of stages.
Authors typically attack this problem with shell scripts and source
code management tools like <application>make</application> and
<application>ant</application>.</para>

<para xml:id="id11">That’s all well and good, but the reality is that these tools
are overkill for the job (not that it isn’t possible to construct
document processing scenarios where they aren’t) and they’re difficult
for non-programmers to install and use.</para>

<para xml:id="id12">All we really need here is a simple declarative language for composing
pipelines. I’m not saying that this would reduce the learning curve for document
processing applications to zero, but I do think it would make the curve
less precipitous.</para>

<para xml:id="id13">Take my word for it, the <filename>Makefile</filename> or
<filename>build.xml</filename> <application>ant</application> script
for the profiling scenario I described above would be a lot more
complicated than this:</para>

<programlisting>&lt;pipeline&gt;
&lt;stage process="XInclude"/&gt;
&lt;stage process="Transform" stylesheet="profile.xsl"/&gt;
&lt;stage process="Validate" schema="schema.rng"/&gt;
&lt;stage process="Transform" stylesheet="doc.xsl"/&gt;
&lt;/pipeline&gt;</programlisting>

<para xml:id="id14">But that’s really all that’s needed.</para>

<para xml:id="id15">I did some work in this area before; I helped edit an
<link xlink:href="http://www.w3.org/TR/2002/NOTE-xml-pipeline-20020228/">XML
Pipeline Definition Language</link> specification for the
<link xlink:href="http://www.w3.org/XML/2001/07/XMLPM.html">XML Processing Model
Workshop</link>. But in retrospect, useful as it was for the workshop,
that language is too complicated for “version one” of a specification
in this space. I think what I’ve outlined above is a lot closer to the
80/20 point.</para>

<para xml:id="id16">Yes, some pipelines require conditional processing, loops, and
complex dependency management, but a whole lot of them don’t. And
getting something into the hands of beleaguered users that allows them
to write a whole lotta simple piplines would be a good
thing.</para>

<para xml:id="id17">Having a standard way to do this would offer all the advantages that
standardization usually brings: vendor support, interoperability, and eventually
ubiquity to name just a few.</para>

<para xml:id="id18">But since I’m <link xlink:href="xmlactivity">not going to get that</link>,
I’m <link xlink:href="sxpipe">rolling my own</link>.</para>

<para xml:id="id19">Share and enjoy!</para>

</essay>

