Rethinking XProc syntax

Volume 11, Issue 2; 07 Jan 2008; last modified 08 Oct 2010

The Working Group has agreed to adjust the XProc syntax one more time. One last time, I sincerely hope.

In designing the XProc syntax, we've been wrestling with a classic problem: how to make the common things easy without making the uncommon things impossible. The most obvious conseqence of our efforts to make the common things easy was the introduction of defaults: in two sequential steps, the primary output of the former step automatically connects to the primary input of the latter step; the primary output of a compound step automatically connects to the primary output of the last step in its subpipeline.

Unfortunately, if the last step is itself a pipeline, then you have to examine that pipeline in order to tell if it has a primary output. Suddenly, the static analysis of a pipeline requires an essentially unbounded amount of computation.

Our first reaction to this problem was to simply remove the defaults from p:pipeline. Problem solved, but only at the expense of some syntactic simplicity in the common case.

Subsequently, we realized that there were other issues (mostly related to name attributes) and another solution was proposed, one that could restore p:pipeline defaults for the simplest case and arguably provide a cleaner model for the language.

I'm not sure how to quantify the cost of this proposal. In some sense it makes the language feel more formal and I worry that it'll be harder for new users to write their first “uncommon” pipeline. Nevertheless, after some debate, we decided to give it a try.

The heart of the new proposal is that we use p:declare-step consistently to declare new steps, even new steps that are defined by a subpipeline.

So where you would once have written:

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
	    name="main">
  <p:input port="source" primary="true">
  <p:input port="stylesheet"/>
  <p:output port="result"/>

  <!-- some steps -->
</p:pipeline>

you now write:

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                xmlns:px="http://example.org/ns/pipelines"
                type="px:main">
  <p:input port="source" primary="true">
  <p:input port="stylesheet"/>
  <p:output port="result"/>

  <!-- some steps -->
</p:declare-step>

Not much different, really: different element name, and the local name of the step type is used as the name of the pipeline.

Having expanded p:declare-step so that it fulfills all of the functionality of p:pipeline, we can now reintroduce p:pipeline as nothing more (or less) than syntactic sugar for the following:


<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
                ... p:pipeline attributes ... >
  <p:input port="source" primary="true" sequence="false"/>
  <p:output port="result" primary="true" sequence="false"/>

  ... p:pipeline contents ...
</p:declare-step>

This makes p:pipeline a very convenient shortcut for the overwhelmingly common case of a pipeline that consists of exactly one input and exactly one output.

With that in mind, the first example above could also be written like this:

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc"
            xmlns:px="http://example.org/ns/pipelines"
            type="px:main">
  <p:input port="stylesheet"/>

  <!-- some steps -->
</p:pipeline>

But I'm not sure that's really easier to understand than the declare step case.

Here are some of the features of the change, as I see them:

The language is more regular: all step types are declared with p:declare-step. Any step type can be run as a top-level step.
A pipeline processor is now expected to be able to evaluate a top-level p:declare-step as well as a top-level p:pipeline. It becomes natural to allow any step type to be evaluated directly.
The p:pipeline-library (now renamed just p:library) is more regular, it can contain only p:import and p:declare-steps.
Because a pipeline declaration can contain other declarations, it's now possible to write modular pipelines without exposing the modules in an external library.
Names are now applied more uniformly to step instances.

The biggest cost, I think, is that there's now a “step” in the learning curve, no pun intended. You used to be able to start with a simple p:pipeline and work your way slowly up to more complex pipelines by adding new features. Now there comes a point where you have to change gears and rename the top-level element to p:declare-step. Still, maybe that's not so hard.

A net win? I hope so.

Comments

It seems that the second code snippet, after "you now write:", is not well formed. ;) Start and end tags do not match.