<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
<title>XProc Versioning and Extensibility</title><biblioid class="uri">http://norman.walsh.name/2007/11/14/xprocVersioning</biblioid>
<volumenum>10</volumenum>
<issuenum>118</issuenum>
<pubdate>2007-11-14T10:05:13-05:00</pubdate>
<date>$Date$</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2007</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>If you don't plan for extensibility when you're designing
version 1.0 of your language, you often don't get any. I think we
have a plan for XProc now.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#TAG"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XProc"/>
</info>

<para xml:id="p1">The
<wikipedia page="Technical_Architecture_Group">TAG</wikipedia> has been
thinking about
<link xlink:href="http://www.w3.org/2001/tag/group/track/issues/41">versioning</link>
for a long time. <foaf:name>David Orchard</foaf:name> has done most of the
heavy lifting, carrying the ball a long way down the field (if you'll pardon
the mixed metaphor). One of the difficulties, I think, is that if you get
a bunch of engineers in a room to talk about generalities, we all want to test
the generalities against the special cases. In the world of versioning and
extensibility, there seem to be nearly as many special cases as there are
languages.</para>

<para xml:id="p2">The <wikipedia page="XML_pipeline">XProc</wikipedia>
story is it's own mixture of big bang evolution, backwards
and forwards compatible evolution, version identifiers, and fallback. This
essay explores that story, with an eye on the
<link xlink:href="http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies-20071113.html">13 Nov 2007 draft</link>
of
<link xlink:href="http://www.w3.org/2001/tag/doc/versioning-compatibility-strategies">Extending and Versioning Languages: Compatibility Strategies</link>.
</para>

<para xml:id="p3">First, a little background. For our purposes today, <link xlink:href="http://www.w3.org/TR/xproc/">XProc</link> is an XML
vocabulary that consists of a small number of “compound steps”,
elements that can contain other steps, and an essentially
unbounded number of “atomic steps”, elements that can't contain
other steps. The term “step”, unqualified, means either a compound step or
an atomic step.</para>

<para xml:id="p4">There's
no <foreignphrase>a priori</foreignphrase> limit on the depth of the
tree that an XProc document represents, but there are only a small
number of element types that can form the trunk and branches. There are
arbitrarily many leaf element types.</para>

<para xml:id="p5">The XProc processor examines that tree and builds an acyclic graph.
The steps are the nodes in that graph. The arcs between
the nodes come mostly from explicit syntax on the steps, but some
are inferred from the structure of the tree and the nature of the
specific steps.</para>

<para xml:id="p6">After it's built the graph, the processor “evaluates” or “executes” it.
Exactly what is entailed in evaluating the graph isn't relevant to this
discussion. Suffice it to say, there are some semantic constraints on
the resulting graph and the processor must be able to interpret it in order
to evaluate it. The language contains conditional constructs that make it
possible for parts of the graph to be ignored.</para>

<para xml:id="p7">If the XProc processor can't build the graph, then it's not a
valid pipeline. In other words, the bare minimum necessary to claim
some degree fowards compatibility is that a V1.0 processor must be able
to build the graph for a V.next pipeline document. A more useful
degree of forwards compatibility would be to make it possible for a V1.0
processor to evaluate the graph.</para>

<section xml:id="extensible">
<title>Languages should be extensible</title>

<para xml:id="p8">The first good practice in the versioning strategies document is that
languages should be extensible. There are three extensibility points in
XProc. First, but least interesting, is the <tag>p:documentation</tag> element.
You can put anything in there, but it's only documentation. Second is the
list of “ignored namespaces”. Ignored namespaces allow you to embed other content
in your pipeline, RDF assertions say, or application-specific job control 
instructions. But these elements are ignored, so they don't have any bearing
on the graph. Finally, a pipeline can declare additional atomic step
types. These declarations allow a pipeline author to use arbitrary new atomic
steps.</para>

<para xml:id="p9">The ability to define arbitrary atomic step types isn't really
about forwards compatibility: it's about extensibility. In fact, it
isn't even sufficient, by itself, to get us over the first
hurdle.</para>

<para xml:id="p10">The problem is that V.next of the language might include a new “built in”
step type. Just as <tag>p:for-each</tag> and <tag>p:xslt</tag> are part of the
V1.0 language and don't require any declaration, <tag>p:dwim</tag> might
be a built in step type in V.next.</para>

<para xml:id="p11">Since it doesn't require any declaration, a V1.0 processor won't know
anything about how it's connected into the graph and, consequently, won't
be able to build the graph.</para>

<para xml:id="p12">So here's the first part of our versioning story. We're going to establish
the convention that a step library, containing all of the declarations for the
built in atomic steps, will appear under the XProc namespace. So, for version
1.0, we'll provide something like
<uri>http://www.w3.org/ns/xproc/steps10.xpl</uri>. No V1.0 processor is ever
going to import that library, but it won't be an error to request that it
be imported.</para>

<para xml:id="p13">When XProc V.next is published, it will come with a new library:
<uri>http://www.w3.org/ns/xproc/stepsVnext.xpl</uri>. If you want to write
a pipeline that is backwards compatible with a previous version of XProc,
you will explicitly import that library:</para>

<programlisting>&lt;p:pipeline xmlns:p="http://www.w3.org/ns/xproc"&gt;
&lt;p:import href="http://www.w3.org/ns/xproc/stepsVnext.xpl"/&gt;
…
&lt;/p:pipeline&gt;</programlisting>

<para xml:id="p14">A V.next processor will ignore that import statement, but a V1.0 processor
will recognize that it's a new standard library and read it. This will give
the V1.0 processor access to declarations for any new built in steps. With these
declarations in hand, it'll be able to build the graph.</para>

<para xml:id="p15">But what about compound steps? Suppose V.next introduces a new
<tag>p:map-reduce</tag> compound step. What then?</para>

<para xml:id="p16">Well, then, a V1.0 processor will halt and catch fire on a pipeline that
uses that step. The nature of the decisions we've made regarding how
connections are identified,
and the use of inferrence to simplify the text of the pipeline document, makes
it impossible to determine the connections on a new compound step. So we're
“big bang” with respect to new compound steps.</para>

<para xml:id="p17">But why can't you just ignore the whole element and all of its descendants?
Because in the general case, this would leave nodes “unconnected”
in the graph. Leaving them unconnected results in a graph that can't be
interpreted. Literally treating the document as if the unknown compound step
wasn't there would either result in errors (a tree from which it is impossible
to infer the necessary connections) or in a graph that can be evaluated but
performs possibily arbitrarily incorrect operations.</para>

</section>
<section xml:id="must-accept">
<title>Must accept unknowns</title>

<para xml:id="p18">That takes us to the second good practice: consumers must accept any
text portion they do not recognize. If you interpret “recognize” as meaning
“can evaluate” then we do ok on this score for atomic steps. After all,
giving a V1.0 processor the declaration for <tag>p:dwim</tag> isn't going
to make the processor “recognize it” for the purpose of evaluation.</para>

<para xml:id="p19">Similarly, for documentation and ignored namespaces, unknowns are
ignored. But when it comes to the semantics of XProc, we can't effectively
ignore arbitrary new elements.</para>

</section>
<section xml:id="preserve-information">
<title>Preserve existing information</title>

<para xml:id="p20">The next three good practice notes have to do with preserving
information. The first states that an extensible language must require
that any texts with extensions be compatible with a text without the
extensions. This is further divided into two cases: establish “compatibility”
by removing the extensions or by preserving them.</para>

<para xml:id="p21">To the extent that we limit our discussion to atomic steps, I
think XProc falls into the “accept by preserving them” category. A
V1.0 processor can build a graph containing a V.next <tag>p:dwim</tag>
step.</para>

<para xml:id="p22">The next two good practices don't really apply. They deal with how
descendants of an uknown element are processed. We've already established
that for unknown compound steps, XProc applies a big bang approach. And
for atomic steps, there aren't any descendants.</para>

</section>
<section xml:id="fallback">
<title>Fallback</title>

<para xml:id="p23">XProc doesn't provide an explicit fallback mechanism, but it
does provide the pipeline author with the tools necessary to construct
one. The combination of a conditional evaluation mechanism and a
function that will determine whether or not a particular step can be
evaluated, allows the author to write a “backwards compatible” pipeline.
</para>

</section>
<section xml:id="unknown-version-identifiers">
<title>Understanding unknown version identifiers</title>

<para xml:id="p24">Perhaps the most interesting part of the XProc versioning and
extensibility story is that it doesn't involve traditional version
numbers. I thought it would. I proposed one. But after we'd worked out
the various strategies for forwards and backwards compatibility, it was
clear that they made no reference to an explicit version identifier.</para>

<para xml:id="p25">That's not absolutely true, of course. The import mechanism that we
establish for providing V1.0 processors with declarations for V.next
built in steps, establishes a “version URI”.</para>

<para xml:id="p26">But the semantics for such an unknown version identifier are clear:
read it.</para>

</section>
<section xml:id="backwards-compatibility">
<title>Backwards compatibility</title>

<para xml:id="p27">I haven't said a lot about backwards compatibility. There are
two reasons for that: first, it isn't really our problem. Backwards
compatibility is something the V.next language designers have to worry
about with respect to the V.previous version(s).</para>

<para xml:id="p28">But what, you might ask, happens if the V.next designers decide
to change the semantics of an atomic step in some incompatible way?
My first answer is: they don't get to do that. If that's necessary, the
incompatible step must be given a new name or the V.next language will
not have the forwards compatibility that we've attempted to provide.</para>

<para xml:id="p29">My second answer is: there is an escape hatch. One of the design
points that's pretty solidly locked down is attributes in no namespace.
You can add new extension attributes to XProc steps to your hearts content,
but they must be in a namespace. A new, unqualfied attribute is backwards
incompatible. A processor that doesn't recognize it must halt and catch
fire.</para>

<para xml:id="p30">So, if the V.next designers really, really need to do this,
<emphasis>they</emphasis> can add a <tag class="attribute">version</tag>
attribute with whatever semantics they want. It'll prevent a V1.0 processor
from attempting to run the V.next pipeline that uses it.</para>
</section>

<section xml:id="summary">
<title>XProc Versioning Summary</title>

<para xml:id="p31">So what is the XProc versioning and extensibility story?</para>

<orderedlist>
<listitem>
<para xml:id="p32">XProc is forwards compatible with respect to new atomic steps.
Pipelines that use new compound steps aren't backwards compatible, but
that doesn't seem too high a price to pay.</para>
<para xml:id="p33">It's possible that we
haven't exhausted all the options; maybe there's a forwards
compatibility story that includes compound steps.</para>
</listitem>
<listitem>
<para xml:id="p34">An pipeline author writing a V.next pipeline can write it to be backwards
compatible with a V1.0 processor (new compound steps excluded).</para>
</listitem>
<listitem>
<para xml:id="p35">To the extent that XProc uses version identifiers, those identifiers
are URIs (which feels sort of good, really) and the semantics of unknown
identifiers are entirely clear.</para>
</listitem>
</orderedlist>

<para xml:id="p36">I think that's a plan.</para>
</section>
</essay>

