<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
<title>Not exactly XProc</title><biblioid class="uri">http://norman.walsh.name/2009/06/23/notXProc</biblioid>
<volumenum>12</volumenum>
<issuenum>23</issuenum>
<pubdate>2009-06-23T18:27:55-04:00</pubdate>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2009</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>One advantage of being an implementor is that I can play with
languages that the Working Group didn't approve.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Calabash"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#W3C"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XProc"/>
</info>

<para xml:id="p1">I've implemented a number of
<wikipedia page="XML_pipeline">XProc</wikipedia> extensions,
and have plans for at least a
<link xlink:href="http://markmail.org/message/s7puhxcez2tmmq4f">few</link>
<link xlink:href="http://markmail.org/message/jnsqrrv5xazervre">more</link>,
but so far they've all used standard extension mechanisms.</para>

<para xml:id="p2">On the train ride home Monday night, I decided to do something
different. Implementor's prerogative.</para>

<para xml:id="p3">The <link xlink:href="http://www.w3.org/TR/xproc/">XProc</link>
specification states that all variables, options, and parameters are
string values. On the whole, I think this is a useful simplification:
</para>

<itemizedlist>
<listitem>
<para xml:id="p4">All of the options used by the standard atomic steps have convenient
string representations: they don't need more complex structures.
</para>
</listitem>
<listitem>
<para xml:id="p5">In an XPath 1.0 implementation there are only a few data types
anyway (remember, there was a time when we thought we might finish
before the
XSLT/XQuery WGs). [Ah, optimism! -ed ]
</para>
</listitem>
<listitem>
<para xml:id="p6">Using strings simplifies serialization issues for steps like
<tag>p:parameters</tag>.
</para>
</listitem>
</itemizedlist>

<para xml:id="p7">But it's frustrating in one particular area, XSLT parameters 
and XQuery external variables can have more complex values. The fact
that XProc doesn't support this means that there are some stylesheets
and queries that can't be fully supported by XProc.</para>

<para xml:id="p8">Early on, I proposed that we allow parameters at least to contain
either strings or documents, but I couldn't get working group support
for the idea. (I think they'll come around, but not in 1.0.)</para>

<para xml:id="p9">I've wondered, ever since my idea got left on the cutting room 
floor, how hard it would be to support arbitrary
<link xlink:href="http://www.w3.org/TR/xpath-datamodel/">XDM</link>
values in XProc.</para>

<para xml:id="p10">So I implemented it.</para>

<para xml:id="p11">Turns out it's not very hard at all.
I extended the
<classname>RuntimeValue</classname> object to preserve the original
XDM value of the expression instead of discarding it after
computing its string value. In <tag>p:xslt</tag> and
<tag>p:xquery</tag>, instead of using the string value for parameters
and external variables, respectively,
I use the XDM value. Everywhere else, I continue to use the string
value so this change has no impact on other atomic steps.</para>

<para xml:id="p12">In compound steps, I made a change analagous to the changes for
<tag>p:xslt</tag> and <tag>p:xquery</tag>, when setting up the environment
for evaluating XPath expressions, I use the XDM values of options and
variables instead of the string values. This means that user-defined
pipelines can accept and use XDM values.</para>

<para xml:id="p13">The hardest part, by far, was changing the <tag>p:parameters</tag>
step and the interpretation of <tag>c:parameter-set</tag> documents
to support an extended serialization for arbitrary XDM values.</para>

<para xml:id="p14">All of which means that you can do things like this:</para>

<informalexample xml:id="pipeline">
<programlisting>&lt;p:declare-step name="main"
		xmlns:p="http://www.w3.org/ns/xproc"
		xmlns:cx="http://xmlcalabash.com/ns/extensions"&gt;
&lt;p:output port="result"/&gt;
&lt;p:serialization port="result" indent="true"/&gt;

&lt;p:input port="config" primary="false"&gt;
  &lt;p:inline&gt;
    &lt;config&gt;
      &lt;name&gt;value&lt;/name&gt;
      &lt;name2&gt;value2&lt;/name2&gt;
      &lt;fragment&gt;
	&lt;doc&gt;
	  &lt;p&gt;Some fragment. How doc/p is useful
	  in a configuration file, I don't know.
	  &lt;/p&gt;
	&lt;/doc&gt;
      &lt;/fragment&gt;
    &lt;/config&gt;
  &lt;/p:inline&gt;
&lt;/p:input&gt;

&lt;p:declare-step type="cx:foo"&gt;
  &lt;p:output port="result"/&gt;

  &lt;!-- This is silly, never do this. --&gt;
  <co xml:id="seq1"/>&lt;p:option name="param-seq" required="true"/&gt;

  &lt;p:xslt template-name="cx:main"&gt;
    &lt;p:input port="source"&gt;
      &lt;p:empty/&gt;
    &lt;/p:input&gt;
    &lt;p:input port="stylesheet"&gt;
      &lt;p:inline&gt;
	&lt;xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
			version="2.0"&gt;

	  &lt;xsl:param name="name"/&gt;
	  &lt;xsl:param name="name2"/&gt;
	  &lt;xsl:param name="fragment"/&gt;

	  &lt;xsl:template name="cx:main"&gt;
	    &lt;cx:doc&gt;
	      &lt;name&gt;&lt;xsl:copy-of select="$name"/&gt;&lt;/name&gt;
	      &lt;name2&gt;&lt;xsl:copy-of select="$name2"/&gt;&lt;/name2&gt;
	      &lt;frag&gt;&lt;xsl:copy-of select="$fragment"/&gt;&lt;/frag&gt;
	    &lt;/cx:doc&gt;
	  &lt;/xsl:template&gt;
	&lt;/xsl:stylesheet&gt;
      &lt;/p:inline&gt;
    &lt;/p:input&gt;
    &lt;p:input port="parameters"&gt;
      &lt;p:empty/&gt;
    &lt;/p:input&gt;
    &lt;p:with-param name="name" select="$param-seq[1]"&gt;<co xml:id="seq2"/>
      &lt;p:empty/&gt;
    &lt;/p:with-param&gt;
    &lt;p:with-param name="name2" select="$param-seq[2]"&gt;
      &lt;p:empty/&gt;
    &lt;/p:with-param&gt;
    &lt;p:with-param name="fragment" select="$param-seq[3]"&gt;
      &lt;p:empty/&gt;
    &lt;/p:with-param&gt;
  &lt;/p:xslt&gt;
&lt;/p:declare-step&gt;

&lt;p:variable name="cfg1" select="/config/name"&gt;<co xml:id="node1"/>
  &lt;p:pipe step="main" port="config"/&gt;
&lt;/p:variable&gt;

&lt;p:variable name="cfg2" select="string(/config/name2)"&gt;<co xml:id="string"/>
  &lt;p:pipe step="main" port="config"/&gt;
&lt;/p:variable&gt;

&lt;p:variable name="cfgfrag" select="/config/fragment/*"&gt;<co xml:id="node2"/>
  &lt;p:pipe step="main" port="config"/&gt;
&lt;/p:variable&gt;

&lt;cx:foo&gt;
  &lt;p:with-option name="param-seq"
		 select="($cfg1,$cfg2,$cfgfrag)"&gt;<co xml:id="seq3"/>
    &lt;p:empty/&gt;
  &lt;/p:with-option&gt;
&lt;/cx:foo&gt;

&lt;/p:declare-step&gt;</programlisting>
</informalexample>

<para xml:id="p15">The <option>param-seq</option> option<coref linkend="seq1"/> of our
user-defined <literal>cx:foo</literal> step expects a sequence (even though this
is silly thing to do in this case).</para>

<para xml:id="p16">We extract items from this sequence<coref linkend="seq2"/> to establish
the values of the stylesheet parameters.</para>

<para xml:id="p17">Back out in our main pipeline, we extract values from the
configuration file and store them in variables. (We don't have to do this,
of course, we could have computed the sequence directly with XPath expressions.)
</para>

<para xml:id="p18">Pay particular attention to the first value<coref linkend="node1"/>.
This XPath expression selects a node; in standard XProc, this would automatically
become a string. Using the general values extension, this will remain a node,
which may not be what was intended.</para>

<para xml:id="p19">The second value<coref linkend="string"/> uses <function>string()</function>
to explicitly make the parameter into a string. The third
example<coref linkend="node2"/> also selects a node.</para>

<para xml:id="p20">Finally, we pass all of these values to the <literal>cx:foo</literal> step
as a sequence<coref linkend="seq3"/>. In standard XProc, this sequence
would be collapsed into a single string value, but it will remain a
sequence if we use the general values extension.</para>

<para xml:id="p21">Run through a standard XProc processor, here is the expected result:</para>

<informalexample xml:id="standard-result">
<programlisting>&lt;cx:doc xmlns:cx="http://xmlcalabash.com/ns/extensions"&gt;
   &lt;name&gt;valuevalue2
	  Some fragment. How doc/p is useful
	  in a configuration file, I don't know.
	  
	&lt;/name&gt;
   &lt;name2/&gt;
   &lt;frag/&gt;
&lt;/cx:doc&gt;</programlisting>
</informalexample>

<para xml:id="p22">We get the string value of all the variables, options,
and parameters with the <option>param-seq</option> option compressed to
a single string value.</para>

<para xml:id="p23">But if we enable the general values extension (with
<literal>-X general-values</literal> on the command line with
<link xlink:href="/2008/projects/calabash">XML Calabash</link> version
0.9.<emphasis>12</emphasis>), we get a 
different result:</para>

<informalexample xml:id="extended-result">
<programlisting>&lt;cx:doc xmlns:cx="http://xmlcalabash.com/ns/extensions"&gt;
   &lt;name&gt;
      &lt;name&gt;value&lt;/name&gt;
   &lt;/name&gt;
   &lt;name2&gt;value2&lt;/name2&gt;
   &lt;frag&gt;
      &lt;doc&gt;
	        &lt;p&gt;Some fragment. How doc/p is useful
	  in a configuration file, I don't know.
	  &lt;/p&gt;
	     &lt;/doc&gt;
   &lt;/frag&gt;
&lt;/cx:doc&gt;</programlisting>
</informalexample>

<para xml:id="p24">Here our sequence has been passed successfully and each of the individual
values has been preserved all the way through to XSLT.</para>

<important>
<para xml:id="p25">With the general values extension, XML Calabash <emphasis>does
not</emphasis> implement XProc 1.0! It implements a closely related, but
entirely non-standard language which you cannot expect to interoperate
with other implementations.
</para>
</important>

<para xml:id="p26">There are still a few obvious weaknesses in this extension.</para>

<orderedlist>
<listitem>
<para xml:id="p27">Implementing a non-standard extension is a bad thing. I probably
should disable it completely.</para>
</listitem>
<listitem>
<para xml:id="p28">There
should be a mechanism (an <tag class="attribute">as</tag> attribute,
probably) to selectively enable this behavior. This would also allow
for type-checking the values passed around.</para>
</listitem>
<listitem>
<para xml:id="p29">The serialization used by <tag>p:parameters</tag> is incompletely
supported. Although the serialization identifies the type of atomic values,
the code which interprets this serialization ignores the types. Integers
may go in, but strings come out.</para>
</listitem>
</orderedlist>

<para xml:id="p30">This is an experimental feature. It may or may not survive over the
long run. Comments most welcome.</para>

<para xml:id="p31">Remember: if you enable this extension, you are not running a conformant
XProc processor. Your gun, your bullet, your foot.</para>

</essay>

