XProc and XPath

Volume 10, Issue 119; 15 Nov 2007; last modified 08 Oct 2010

When are two versions better than one? Maybe never, but apparently sometimes two versions are an inevitable compromise.

Way back in the beginning, when we decided that XProc would use XPath as its expression language, we chose XPath 1.0. At the time, XPath 2.0 was not yet a recommendation and, based on the track record of the XPath/XSLT/XQuery specifications, it wasn't impossible to imagine we might get finished first.

We didn't. And ever since XPath 2.0 became a recommendation, there have been suggestions that XProc's choice was a bad one. On a purely personal level, I don't think it was; I think it will turn out to be easy to express the overwhelming majority of XPath expressions that actually occur in real pipeline documents in XPath 1.0.

But implementors and early adopters and other working groups don't really care about such mundane things. Implementors care about supporting an old standard (“we're only planning to implement XPath 2.0, so…”), early adopters want the latest toys (“but I can't write pipelines that test typed joins…”), and working groups, well, they've got their own goals.

We stood our ground for a while, but at our most recent face to face, we gave in.

There are three options, really: use XPath 1.0 exclusively, use XPath 2.0 exclusively, or allow authors and implementors the freedom to choose which they want to use.

If we can't use 1.0 only because it's too old, I think it's pretty clear that we can't use 2.0 only because it's too new. There are major platforms that don't have any sort of standard support for XPath 2.0 and we have implementors that don't have, and don't plan to write, XPath 2.0 implemenations. (Not least of all because they know full right well that all the pipelines they'll ever need will never require XPath 2.0.)

So we're doing what all working groups do when faced with an impossible choice: we're choosing both.

Implementors can implement what they want. Authors can use what they want. Authors can say what version they're using and implementations must honor it. If authors don't say, then they get what the implementor decides. (That's often going to be fine since expressions that are the same in both environments will work in both environments.)

If the author asks for a version that the implementor supports, it just works like you'd expect.
If the author asks for XPath 1.0 and the implementor uses 2.0 (or later), then the implementation must evaluate the expression in XPath 1.0 compatibility mode. (Implementations that don't implement XPath 1.0 compatibility mode must reject the pipeline.)
If the author asks for XPath 2.0 and the implementor uses 1.0, then the implementation must not evaluate the expression unless it can determine that the result would be the same if it was evaluated with XPath 2.0.

This rule may seem a little crazy, but the working group was determined to be as conservative as possible. In practice, it means that an XPath 1.0 implementation can just reject all pipelines that use XPath 2.0. Or it can decide to accept expressions that don't contain square brackets, or it can try to be even more clever.

I'm suspicious of these rules. I'm suspicious of the whole “use both” strategy, in fact, but there will be a new draft shortly and more opportunities for users and implementors to…make suggestions.

Perhaps I'm worrying too much. One world view says that over time XPath 2.0 will dominate and market pressures will encourage XProc engines to be based on XPath 2.0. So authors will mostly use XPath 2.0, implementations will mostly use XPath 2.0, and everything will be fine. The remaining, presumably diminishing, XPath 1.0 based users will still be conformant, but they'll have less success attempting to interoperate with new implementations.

Comments

Overall this seems like a reasonable compromise when it's not clear how quick the move to XPath 2.0 is likely to be.

However, I worry a bit about: If authors don't say, then they get what the implementor decides. True, it'll usually work out fine, as you say, but maybe sometimes it won't in surprising ways. It would have been better controlled to say it's XPath 1.0 unless the pipeline author says otherwise. That way the default would only break with a 2.0 implementation which doesn't do backward-compatibility mode. For many simple XPaths it might be sensible to allow the author to explicitly say that 1.0 and 2.0 are both OK.

As long as the xslt component leaves it up to the implementation whether or not to use an xsl/xpath 2 engine, I think it's very reasonable to limit xpath expressions in xproc to version 1.0.

Thus if you really, really needed something from xpath 2 you could do a transform in the xslt component and have it come up with a some sort of the routing meta-document that you could use an input to other pipelines. I think that this is a descent design pattern anyway because if you need xpath 2 you're probably doing something complex and encapsulating the complexity outside of the xproc flow makes sense.

Futhermore, I think that the implementation support simply isn't there yet. Of (what I think are) the big four xsl engines:saxon, msxml, libxml and xalan; only saxon is at the 2.0 level. I think the msxml team has started on it, xalan is still thinking about it, and I haven't heard/seen anything from the libxml camp.

Therefore, I agree that it would be wise to keep it simple and require only xpath 1.0.