<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
<title>Implementing XProc, I</title><biblioid class="uri">http://norman.walsh.name/2007/04/25/implXProcI</biblioid>
<volumenum>10</volumenum>
<issuenum>38</issuenum>
<pubdate>2007-04-25T07:37:17-04:00</pubdate>
<date>$Date: 2007-04-25 07:54:51 -0400 (Wed, 25 Apr 2007) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2007</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>Part the first, in which we consider the heart of the problem.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Java"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XProc"/>
</info>

<para xml:id="p1">This essay is part of a series of essays about implementing an
<wikipedia page="XML_pipeline">XProc</wikipedia> processor.
<citetitle xlink:href="http://www.w3.org/TR/xproc/">XProc: An XML Pipeline
Language</citetitle> is a W3C specification for specifying a sequence of operations
to be performed on one or more XML documents. I'm
<link xlink:href="http://xproc.dev.java.net/">implementing XProc</link> as
the specification progresses. Elsewhere you'll find background
<link xlink:href="http://norman.walsh.name/2004/06/20/pipelines">about
pipelines</link> and other essays
<link xlink:href="http://norman.walsh.name/knows/what/xproc">about XProc</link>.</para>

<para xml:id="p2">I hope that my implementation evolves to be complete and robust; I also
hope that it achieves respectable performance, but
those are not the most important immediate goals. The most important immediate goal
is to produce a conformant implementation of the whole spec. I'll cross the other
bridges when I get to them. Presented with a decision about how something should
be implemented, I have without reservation selected the answer that seemed easiest.
</para>

<para xml:id="p3">With that preamble out of the way, let's start in the middle.</para>

<para xml:id="p4">At the end of the day, the fundamental operation that an XML
pipeline processor performs is that it passes the output of one
process to the input of another. Consider a simple, two step pipeline that
expands XIncludes and then runs XSLT. At a high level, the processor:
</para>

<orderedlist>
<listitem>
<para xml:id="p5">Starts with an XML document (where that initial document comes from is
an orthogonal issue).</para>
</listitem>
<listitem>
<para xml:id="p6">Passes that XML document to an XInclude step.</para>
</listitem>
<listitem>
<para xml:id="p7">The XInclude step does some work and produces, as its output, a new
XML document.</para>
</listitem>
<listitem>
<para xml:id="p8">The processor takes that new document and a stylesheet document
and passes them both to an XSLT step.</para>
</listitem>
<listitem>
<para xml:id="p9">The XSLT step does some work and produces, as its output, a new XML
document.</para>
</listitem>
<listitem>
<para xml:id="p10">That document is the result of the pipeline (and for the moment,
like the initial document, what the processor does with the final
result is an orthogonal issue.)</para>
    </listitem>
</orderedlist>

<para xml:id="p11">The first question to ask then is, how are we going to pass documents
from one step to the next?</para>

<para xml:id="p12">There are lots of possibilities: documents could be passed as serialized
octet streams, of course, or more efficiently as
<wikipedia page="Document_Object_Model">DOM</wikipedia>s
or object models of
some sort. The steps could be wired together as
<wikipedia page="Simple_API_for_XML">SAX</wikipedia> or
<wikipedia>StAX</wikipedia> filters. StAX
events could be passed between them. There are probably other choices too.</para>

<para xml:id="p13">In this particular case, I know a little bit about what lies
down the road. I know that some steps will have to accept multiple
inputs and I know that some output streams will have to be “split” so
that multiple steps can use them. I also know that while some
components require whole documents, many can operate on streams, never
needing the entire document at once.</para>

<para xml:id="p14">With those things in mind, I chose to implement the connections between
steps using StAX
“<link xlink:href="http://java.sun.com/javase/6/docs/api/javax/xml/stream/events/XMLEvent.html">XMLEvent</link>”
objects. This approach has the additional feature that it fits
perfectly into the “water flowing through pipes” analogy that's sometimes
used to describe pipelines.</para>

<para xml:id="p15">A pipeline is a sequence (or directed, acyclic graph at any rate) of steps.
The steps are connected by pipes. Just as water flows through the pipes in your
home, <classname>XMLEvent</classname> objects flow through the pipes in my
XProc pipelines.
</para>

<para xml:id="p16">Pipes naturally have a readable end, a faucet you can draw from, and a writable
end, a drain into which you can pour things. From inside a step, you can only see
the ends of the pipe, sources and sinks, readable pipes and writable pipes.
The pipeline processor can see the whole pipe. It looks something like this:
</para>

<programlisting>public class Pipe implements ReadablePipe, WritablePipe {
    public XMLEventWriter getWriter () { … }
    public XMLEventReader getReader() { … }
    …
}</programlisting>

<para xml:id="p17">(There's more to it, of course, but we'll come back to look at other
aspects of pipes later. In particular, we're going to have to deal with
sequences of documents.)</para>

<para xml:id="p18">The step holding the writable end of the pipe can get the
<classname>XMLEventWriter</classname> and
pour events into it. The step holding the readable end of the pipe can
get the <classname>XMLEventREader</classname> and read events from it.</para>

<para xml:id="p19">Like a real pipe, events poured in one end don't instantaneously
get drawn out on the other. And just because you open the faucet, that
doesn't mean there's water ready to flow through the pipe. So the
implementation of pipes has to handle some capacity and must be
prepared to block the reader while waiting for the writer.</para>

<para xml:id="p20">At the moment, the pipe between the two ends is a simple
<classname>Vector</classname>. This will require some synchronization when
I enable threading, but for the moment, it's sufficient.</para>

</essay>

