Implementing XProc, IX
Part the ninth, in which we arrange for you to get in on the act.
This essay is part of a series of essays about implementing an XProc processor. XProc: An XML Pipeline Language is a W3C specification for specifying a sequence of operations to be performed on one or more XML documents. I'm implementing XProc as the specification progresses. Elsewhere you'll find background about pipelines and other essays about XProc.
I think I've turned a corner on my implementation. Most of the basic functionality is in place and it's starting to feel useful.
I recently refactored the code that deals with called pipelines (i.e. when you import a pipeline or pipeline library and then call a pipeline as an atomic step). I had been trying to instantiate the compound step that is the pipeline at the location where the call occurred. It got messy.
Instead, I refactored things to leave an atomic “pipeline call” step where the call occurs. When that step is run, it instantiates the pipeline and runs it. Obvious, eh?
An unintended consequence of this change is that it makes the path
to a fairly clean external interface clear. I expect
lots of requests for new step types. Some of these we'll probably standardize
(either as required or optional steps) and some of the ones we don't will
(hopefully) wind up on http://exproc.org/
.
That'll still leave some that folks want to implement for themselves. I want that to be as easy as possible in XProc. I've only been playing with it for a few hours, but this seems to be the smallest interface that will get the job done:
public interface AtomicStep {
public void addInput(String port, ReadablePipe pipe);
public void addOutput(String port, WritablePipe pipe);
public void addOption(QName name, String value, XProcNamespaceContext nsContext);
public void addParameter(QName name, String value, XProcNamespaceContext nsContext);
public void addParameter(String port, QName name, String value, XProcNamespaceContext nsContext);
public boolean needInScopeOptions();
public void setInScopeOptions(Hashtable<QName,String> options);
public void run() throws XMLStreamException;
}
The pipeline step calls the “add” methods to pass along the inputs,
outputs, parameters, and options, then it calls run
.
A few notes:
-
The step has already been checked against its signature, so a lot of the error checking has already been done for you.
-
The
ReadablePipe
andWritablePipe
classes return readers and writers for a document or sequence of documents. That's your input and your output. -
The second form of
addParameter
is only used in those (rare, I expect) cases where a step has more than one parameter input port. -
Steps that need access to all the in-scope options (because, for example, they expect some of their options to be XPath expressions that might include variable references) can return “
true
” forneedInScopeOptions
and the full set will be passed back insetInScopeOptions
. -
When run, the class should write to the output port(s). It should throw appropriate
XProcException
s as necessary.
This interface is going to be much simpler to program to, I think, than extending parts of the implementations class hierarchy.
Obviously, there's a little more to making this work. There's some configuration work and I haven't really described how reading and writing work. If you're game to try writing your own extension steps, let me know and I'll try to fill you in.
You can see it in action in the implementation of the
p:directory-list
and p:add-attribute
steps
(org.xproc.library.DirectoryList
and
org.xproc.library.AddAttribute
, respectively).
I haven't decided if I want to convert all the steps to this interface
or not. I'm thinking probably yes.
Comments
Do convert them all, by way of eating your own dogfood. A disgusting image, by the way: what actual maker of dogfood eats his own product?
Why not
public void setInScopeOptions(Map options);
instead of
public void setInScopeOptions(Hashtable options);
?
More than that, I fear you don't give access to extension element and extension attributes
Xmlizer