Implementing XProc, IX

Volume 10, Issue 103; 06 Oct 2007; last modified 08 Oct 2010

Part the ninth, in which we arrange for you to get in on the act.

This essay is part of a series of essays about implementing an XProc processor. XProc: An XML Pipeline Language is a W3C specification for specifying a sequence of operations to be performed on one or more XML documents. I'm implementing XProc as the specification progresses. Elsewhere you'll find background about pipelines and other essays about XProc.

I think I've turned a corner on my implementation. Most of the basic functionality is in place and it's starting to feel useful.

I recently refactored the code that deals with called pipelines (i.e. when you import a pipeline or pipeline library and then call a pipeline as an atomic step). I had been trying to instantiate the compound step that is the pipeline at the location where the call occurred. It got messy.

Instead, I refactored things to leave an atomic “pipeline call” step where the call occurs. When that step is run, it instantiates the pipeline and runs it. Obvious, eh?

An unintended consequence of this change is that it makes the path to a fairly clean external interface clear. I expect lots of requests for new step types. Some of these we'll probably standardize (either as required or optional steps) and some of the ones we don't will (hopefully) wind up on http://exproc.org/.

That'll still leave some that folks want to implement for themselves. I want that to be as easy as possible in XProc. I've only been playing with it for a few hours, but this seems to be the smallest interface that will get the job done:

public interface AtomicStep {
    public void addInput(String port, ReadablePipe pipe);
    public void addOutput(String port, WritablePipe pipe);
    public void addOption(QName name, String value, XProcNamespaceContext nsContext);
    public void addParameter(QName name, String value, XProcNamespaceContext nsContext);
    public void addParameter(String port, QName name, String value, XProcNamespaceContext nsContext);
    public boolean needInScopeOptions();
    public void setInScopeOptions(Hashtable<QName,String> options);
    public void run() throws XMLStreamException;
}

The pipeline step calls the “add” methods to pass along the inputs, outputs, parameters, and options, then it calls run. A few notes:

The step has already been checked against its signature, so a lot of the error checking has already been done for you.
The ReadablePipe and WritablePipe classes return readers and writers for a document or sequence of documents. That's your input and your output.
The second form of addParameter is only used in those (rare, I expect) cases where a step has more than one parameter input port.
Steps that need access to all the in-scope options (because, for example, they expect some of their options to be XPath expressions that might include variable references) can return “true” for needInScopeOptions and the full set will be passed back in setInScopeOptions.
When run, the class should write to the output port(s). It should throw appropriate XProcExceptions as necessary.

This interface is going to be much simpler to program to, I think, than extending parts of the implementations class hierarchy.

Obviously, there's a little more to making this work. There's some configuration work and I haven't really described how reading and writing work. If you're game to try writing your own extension steps, let me know and I'll try to fill you in.

You can see it in action in the implementation of the p:directory-list and p:add-attribute steps (org.xproc.library.DirectoryList and org.xproc.library.AddAttribute, respectively). I haven't decided if I want to convert all the steps to this interface or not. I'm thinking probably yes.

Comments

Do convert them all, by way of eating your own dogfood. A disgusting image, by the way: what actual maker of dogfood eats his own product?

Why not

public void setInScopeOptions(Map options);

instead of

public void setInScopeOptions(Hashtable options);

More than that, I fear you don't give access to extension element and extension attributes

Xmlizer