Not exactly XProc

Volume 12, Issue 23; 23 Jun 2009; last modified 08 Oct 2010

One advantage of being an implementor is that I can play with languages that the Working Group didn't approve.

I've implemented a number of XProc extensions, and have plans for at least a few more, but so far they've all used standard extension mechanisms.

On the train ride home Monday night, I decided to do something different. Implementor's prerogative.

The XProc specification states that all variables, options, and parameters are string values. On the whole, I think this is a useful simplification:

  • All of the options used by the standard atomic steps have convenient string representations: they don't need more complex structures.

  • In an XPath 1.0 implementation there are only a few data types anyway (remember, there was a time when we thought we might finish before the XSLT/XQuery WGs). [Ah, optimism! -ed ]

  • Using strings simplifies serialization issues for steps like p:parameters.

But it's frustrating in one particular area, XSLT parameters and XQuery external variables can have more complex values. The fact that XProc doesn't support this means that there are some stylesheets and queries that can't be fully supported by XProc.

Early on, I proposed that we allow parameters at least to contain either strings or documents, but I couldn't get working group support for the idea. (I think they'll come around, but not in 1.0.)

I've wondered, ever since my idea got left on the cutting room floor, how hard it would be to support arbitrary XDM values in XProc.

So I implemented it.

Turns out it's not very hard at all. I extended the RuntimeValue object to preserve the original XDM value of the expression instead of discarding it after computing its string value. In p:xslt and p:xquery, instead of using the string value for parameters and external variables, respectively, I use the XDM value. Everywhere else, I continue to use the string value so this change has no impact on other atomic steps.

In compound steps, I made a change analagous to the changes for p:xslt and p:xquery, when setting up the environment for evaluating XPath expressions, I use the XDM values of options and variables instead of the string values. This means that user-defined pipelines can accept and use XDM values.

The hardest part, by far, was changing the p:parameters step and the interpretation of c:parameter-set documents to support an extended serialization for arbitrary XDM values.

All of which means that you can do things like this:

<p:declare-step name="main"
		xmlns:p="http://www.w3.org/ns/xproc"
		xmlns:cx="http://xmlcalabash.com/ns/extensions">
<p:output port="result"/>
<p:serialization port="result" indent="true"/>

<p:input port="config" primary="false">
  <p:inline>
    <config>
      <name>value</name>
      <name2>value2</name2>
      <fragment>
	<doc>
	  <p>Some fragment. How doc/p is useful
	  in a configuration file, I don't know.
	  </p>
	</doc>
      </fragment>
    </config>
  </p:inline>
</p:input>

<p:declare-step type="cx:foo">
  <p:output port="result"/>

  <!-- This is silly, never do this. -->
  <p:option name="param-seq" required="true"/>

  <p:xslt template-name="cx:main">
    <p:input port="source">
      <p:empty/>
    </p:input>
    <p:input port="stylesheet">
      <p:inline>
	<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
			version="2.0">

	  <xsl:param name="name"/>
	  <xsl:param name="name2"/>
	  <xsl:param name="fragment"/>

	  <xsl:template name="cx:main">
	    <cx:doc>
	      <name><xsl:copy-of select="$name"/></name>
	      <name2><xsl:copy-of select="$name2"/></name2>
	      <frag><xsl:copy-of select="$fragment"/></frag>
	    </cx:doc>
	  </xsl:template>
	</xsl:stylesheet>
      </p:inline>
    </p:input>
    <p:input port="parameters">
      <p:empty/>
    </p:input>
    <p:with-param name="name" select="$param-seq[1]">
      <p:empty/>
    </p:with-param>
    <p:with-param name="name2" select="$param-seq[2]">
      <p:empty/>
    </p:with-param>
    <p:with-param name="fragment" select="$param-seq[3]">
      <p:empty/>
    </p:with-param>
  </p:xslt>
</p:declare-step>

<p:variable name="cfg1" select="/config/name">
  <p:pipe step="main" port="config"/>
</p:variable>

<p:variable name="cfg2" select="string(/config/name2)">
  <p:pipe step="main" port="config"/>
</p:variable>

<p:variable name="cfgfrag" select="/config/fragment/*">
  <p:pipe step="main" port="config"/>
</p:variable>

<cx:foo>
  <p:with-option name="param-seq"
		 select="($cfg1,$cfg2,$cfgfrag)">
    <p:empty/>
  </p:with-option>
</cx:foo>

</p:declare-step>

The param-seq option of our user-defined cx:foo step expects a sequence (even though this is silly thing to do in this case).

We extract items from this sequence to establish the values of the stylesheet parameters.

Back out in our main pipeline, we extract values from the configuration file and store them in variables. (We don't have to do this, of course, we could have computed the sequence directly with XPath expressions.)

Pay particular attention to the first value. This XPath expression selects a node; in standard XProc, this would automatically become a string. Using the general values extension, this will remain a node, which may not be what was intended.

The second value uses string() to explicitly make the parameter into a string. The third example also selects a node.

Finally, we pass all of these values to the cx:foo step as a sequence. In standard XProc, this sequence would be collapsed into a single string value, but it will remain a sequence if we use the general values extension.

Run through a standard XProc processor, here is the expected result:

<cx:doc xmlns:cx="http://xmlcalabash.com/ns/extensions">
   <name>valuevalue2
	  Some fragment. How doc/p is useful
	  in a configuration file, I don't know.
	  
	</name>
   <name2/>
   <frag/>
</cx:doc>

We get the string value of all the variables, options, and parameters with the param-seq option compressed to a single string value.

But if we enable the general values extension (with -X general-values on the command line with XML Calabash version 0.9.12), we get a different result:

<cx:doc xmlns:cx="http://xmlcalabash.com/ns/extensions">
   <name>
      <name>value</name>
   </name>
   <name2>value2</name2>
   <frag>
      <doc>
	        <p>Some fragment. How doc/p is useful
	  in a configuration file, I don't know.
	  </p>
	     </doc>
   </frag>
</cx:doc>

Here our sequence has been passed successfully and each of the individual values has been preserved all the way through to XSLT.

Important

With the general values extension, XML Calabash does not implement XProc 1.0! It implements a closely related, but entirely non-standard language which you cannot expect to interoperate with other implementations.

There are still a few obvious weaknesses in this extension.

  1. Implementing a non-standard extension is a bad thing. I probably should disable it completely.

  2. There should be a mechanism (an as attribute, probably) to selectively enable this behavior. This would also allow for type-checking the values passed around.

  3. The serialization used by p:parameters is incompletely supported. Although the serialization identifies the type of atomic values, the code which interprets this serialization ignores the types. Integers may go in, but strings come out.

This is an experimental feature. It may or may not survive over the long run. Comments most welcome.

Remember: if you enable this extension, you are not running a conformant XProc processor. Your gun, your bullet, your foot.

Comments

A very interesting exercise indeed. What should we take from it? Being an implementer myself, my first reaction was... mild terror: There is no way of stopping Norm from making Calabash the hottest XProc processor out there :)

My main problem with this is - and you made it clear several times in the text - interoperability. That you made this feature optional does not mean that people will not be using it. Once you give them a feature that will make certain things easier (or possible at all), I can't imagine they would resist, just because they should honour standards. And when that happens, other processors are out of the game. Of course they can then start a game of their own, but...

XProc is so much fun to implement - and to extend. I would be the first (or close second) to testify that: Calumet, our processor, is made to be extensible, and its API almost encourages programmers to write plug-ins to provide extension functionality. So I place myself at risk here, too. I have done my best to stay on the good side of the force by supporting only the "standard" ways of extending XProc (extension steps, support for additional URI schemes, etc.), but there is a certain danger that things might go awry if evil programmers really wanted to.

What you just showed is a way to extend or improve (some may think: fix) the core language itself. It makes Calabash a non-conformant XProc processor. Or, as you put it, a processor of a different language - but a language that happens to use the same namespace as XProc and thus some of the pipelines may run with other processors (sometimes they run fine, sometimes fail, and sometimes they produce different results). Interoperability hell.

Suppose the most likely - at least for me - scenario in which the WG agrees there is some merit in your proposal, but that things are not going to change for V1. In that case, is there a way out of this for Calabash? I think there is, and it actually based on standard XProc versioning mechanism.

The specification does not explicitly mention this possibility, but in theory, Calabash could invent its own (Calabash-specific) XProc version, something like: http://xmlcalabash.com/ns/extensions/xproc-1.5.xpl. You could then create the following pipeline:

<p:pipeline>
  <p:import href="http://xmlcalabash.com/ns/extensions/xproc-1.5.xpl"/>
  ...
</p:pipeline>

In Calabash, using this XProc version would mean that your "general-values" feature would be enabled. Other processors would reject this pipeline because they would not support this XProc version. Probably not the best solution, but in my opinion, still better than an ad-hoc command-line switch.

—Posted by Vojtěch Toman on 24 Jun 2009 @ 09:30 UTC #

That's an interesting idea, Vojtěch. I was thinking that the pipelines ought to be non-conformant in some way too, to discourage attempts to use them interoperably. Rather than the import trick, I think I'm inclined to change the syntax in a different way, adding an "as" attribute to variable, option, and parameter elements. That'd be non-standard too, so other processors would reject the pipeline.

What I think will really happen though is that I'll turn the feature off again in a release or two. The overwhelming majority of pipelines don't need the feature and the last thing I want is a landscape where XProc aren't interoperable.

As we go forward, we'll see what users ask for and what problems they encounter. If we all have users who need the same thing, then we can use the exproc.org (or expath.org) venues to develop extensions cooperatively.

If XProc is successful enough and popular enough (and I think all the available evidence suggests it will be) to warrant a V.next, I think this feature is a very good candidate for standardizing in that version.

But let's get this version to Recommendation before we worry about it too much!

—Posted by Norman Walsh on 25 Jun 2009 @ 11:20 UTC #

'end running' around options/params/vars is just an expression that these elements really do not go far enough to do the job for what people probably want/expect.

since you reopened this can of worms I will express how I would like to see things:

* rationalize options/params/variables into one element and keep them as strings ... these elements seem like a compromise

* allow input bindings to be accessible from inside steps who want to access them (p:xslt,p:xquery)

for example:

<p:declare-step type="p:xslt" xml:id="xslt" xmlns:xproc="http://xproc.net/xproc" xproc:support="true">

<p:input port="source" sequence="true" primary="true" select="/"/>

<p:input port="stylesheet" primary="false" select="/"/>

<p:input port="parameters" primary="false" kind="parameter" select="/"/>

<!-- I would allow this //-->

<p:input port="*" primary="false"/>

<p:output port="result" primary="true" select="/"/>

<p:output port="secondary" primary="false" sequence="true" select="/"/>

<p:option name="initial-mode"/>

<p:option name="template-name"/>

<p:option name="output-base-uri"/>

<p:option name="version"/>

</p:declare-step>

then you could just bring in xml as a binding

<p:xslt>

<p:input port="source">

<p:empty/>

</p:input>

<p:input port="name"/>

<p:input port="name2"/>

<p:input port="fragment"/>

<p:input port="stylesheet">

<p:inline>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:param name="name"/>

<xsl:param name="name2"/>

<xsl:param name="fragment"/>

<xsl:template name="cx:main">

<cx:doc>

<name><xsl:copy-of select="$name"/></name>

<name2><xsl:copy-of select="$name2"/></name2>

<frag><xsl:copy-of select="$fragment"/></frag>

</cx:doc>

</xsl:template>

</xsl:stylesheet>

</p:inline>

</p:input>

</p:xslt>

otherwise I am with Vojtech in that such behavior (esp now with the spec so precariously close) should reuse xproc reuse mechanisms.

interesting stuff though

—Posted by Jim Fuller on 25 Jun 2009 @ 02:24 UTC #