XProc Working Draft (6 July 2007)

Volume 10, Issue 64; 06 Jul 2007; last modified 08 Oct 2010

The XML Processing Model Working Group has published a new Working Draft of XProc: An XML Pipeline Language.

The Processing Model Working Group is driving as quickly as possible to Last Call on XProc. I think our latest draft, published today, is getting pretty close.

This draft resolves (finally, I hope) the question of how to deal with parameters. I don't think we've quite dotted the i's and crossed the t's on how the new parameter story interfaces with the outside world, but I'm confident that we'll get there.

We've also introduced a new defaulting story. I expect this will generate feedback, both positive and negative. Designing an XML language like XProc is necessarily a compromise. On the one hand, it would be possible (and is desireable) to design a completely explicit and regular language. On the other hand, it has to be something that users can reasonably be expected to write in a text editor, and that means some syntactic shortcuts are desirable.

(You may think that tools will completely obviate the need for users to look at angle brackets. If XProc is successful, you might be right. But the early adopters write XML in an editor and if the early adopters don't use your language, it'll never be successful. So it has to be something users can write in a text editor. QED.)

In a nutshell, the defaulting story goes like this:

Steps can have a distinguished “primary input port” and a distinguished “primary output port” (independently).
If the user doesn't specify a binding for a primary input port, it is automatically bound to the preceding step's primary output port.
If a pipeline does not declare any input ports, and the first step in the pipeline has a primary input port, then the pipeline gets an anonymous input that's automatically bound to the primary input port of the first step.
If a pipeline does not declare any output ports, and the last step in the pipeline has a primary output port, then the pipeline gets an anonymous output that's automatically bound to the primary output port of the last step.

What all this means is, the following pipeline does the right thing:

<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc">
  <p:xinclude/>
  <p:xslt>
    <p:input port="stylesheet">
      <p:document href="someURI.xsl"/>
    </p:input>
  </p:xslt>
</p:pipeline>

It accepts one document, runs XInclude on it, transforms it with the specified stylesheet, and produces as its output, the result of that transformation.

We've also done some work on the step library, generalizing p:head, p:tail, and p:matching-documents into a single p:split-sequence step; added p:count; added p:string-replace; etc.

I hope the next draft is our Last Call draft and I hope it comes out in July. What stands in our way? A solid serialization story, a solid p:http-request story, a review of our error story, some general cleanup and clarification. What else?

Comments

Please get rid of the term "port"! It adds a concept where none is needed. Just say "input" and "output" and give them a name or id. I know Scheme is wonderful, but...

Are you sure, Rick? I'm not sure it would read as clearly if we couldn't distinguish between the inputs and outputs that flow between steps and the _____ on the steps where the inputs and outputs are "attached".