Implementing XProc, VII

Volume 10, Issue 70; 20 Jul 2007; last modified 08 Oct 2010

Part the seventh, in which we (re)consider a fundamental part of the design.

Plan to throw one away. You will anyway.

Frederick P. Brooks

This essay is part of a series of essays about implementing an XProc processor. XProc: An XML Pipeline Language is a W3C specification for specifying a sequence of operations to be performed on one or more XML documents. I'm implementing XProc as the specification progresses. Elsewhere you'll find background about pipelines and other essays about XProc.

I implemented XProc in the traditional way: I threw the first one out. In fact, I threw the first two out. And I think the third is headed for a major refactoring.

Given a pipeline, my implementation does two things: first, it builds and augments a model of the pipeline. This step makes defaults explicit, checks the validity of the pipeline, and makes a few changes that are necessary for my implementation to process it. In this model, the objects all represent “source artifacts” of one form or another.

Next, it constructs another model of the pipeline designed for execution. In this model, the objects all represent “steps” of one form or another.

On the surface, this seems like a good idea. Validation and execution are different processes. Separation of concerns, don't you see?

Except, in practice, they aren't very separate. The execution model relies on “peeking” into the validation model to get namespace declarations, to find the names of steps, to get to the declarations for atomic steps, etc. So either I've modelled things badly or implemented the models badly, or both.

One of the things that motivated having two models was that I was anticipating the possibility of accepting multiple pipeline document syntaxes. I thought modeling the pipeline in a syntax-agnostic manner before attempting to evaluate it would make things easier.

My design vision for the XProc language was a very explicit, verbose one. Over time, the working group has found consensus in a much less explicit and verbose design with a fair number of defaults and syntactic shortcuts. There's much less impetus now to develop a “compact syntax” version of XProc. I doubt I'll ever bother.

I think I have two choices: either accept that there's only one model and refactor the code accordingly, or really make the two models separate; make sure that all of the information needed in the second model is passed explicitly to the constructors, for example, instead of passing just a reference to the object in the first model.

I'm leaning towards a single model at the moment, but I'm open to suggestions.


I think there is nothing at all wrong with having a tree that's close to the syntax and then building another tree that represents what is to be done at execution, nor is there anything wrong with keeping around the syntax tree and having the execution tree look into it (through a well-defined API, of course).

You are writing a compiler/interpreter pair, like Java itself. The compiler both has a data structure close to Java syntax, and generates a data structure rather far from Java syntax (byte code). But in addition to byte code, however, various bits of the compiler's model are kept around at runtime, symbol tables and the like, to provide information needed at runtime. If it weren't for the perceived need for a separable class file, it would be quite plausible for Java execution to use the compiler's data structures directly.

—Posted by John Cowan on 20 Jul 2007 @ 01:55 UTC #

Interesting, Norm talks about models; John replies by talking about trees. Does either model happen to be a tree?

—Posted by Ed Davies on 20 Jul 2007 @ 08:29 UTC #

That's an interesting question. The "top" of the model is the p:pipeline step and it contains an ordered list of children. Some of those children also contained an ordered list of their own children, etc. So I guess they are tree like in that sense.

There are also connections that cross tree boundaries, but if I squint I can think of those as not unlike ID/IDREF links, so I'm not sure that makes the model un-tree-like.

But neither is intentionally a model of a tree.

—Posted by Norman Walsh on 20 Jul 2007 @ 08:44 UTC #