<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="pto" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#">
<info>
    
    
    
    
    
    
    
    
    
<title>XInclude, xml:base, and validation</title><biblioid class="uri">http://norman.walsh.name/2005/04/01/xinclude</biblioid>
<volumenum>8</volumenum>
<issuenum>47</issuenum>
<pubdate>2005-04-01T10:21:57-05:00</pubdate>
<date>$Date: 2006-07-14 10:06:19 -0400 (Fri, 14 Jul 2006) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2005</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>It turns out that there's a nasty interaction between XInclude,
xml:base, and validation. Update: I was wrong. The interaction is real,
but it didn't go unnoticed.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#W3C"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<epigraph>
<attribution>
      <personname>
	<firstname>E.</firstname>
<surname>Dijkstra</surname>
      </personname>
    </attribution>
<para xml:id="p2">Testing can show the presence of errors, but
not their absence.
</para>
</epigraph>

<para xml:id="p1">I doubt this is news, it's drifted across my radar
a couple times in the past month or two: it turns out there's a nasty
interaction between
<link xlink:href="http://www.w3.org/TR/xinclude/">XInclude</link>,
<link xlink:href="http://www.w3.org/TR/xmlbase/">xml:base</link>, and
<link xlink:href="http://en.wikipedia.org/wiki/Xml#Valid_documents">validation</link>.
</para>

<para xml:id="p3">Consider the following documents:</para>

<example>
<title>http://norman.walsh.name/2005/04/01/examples/enbook.xml</title>
<programlisting>
      <textobject>
<textdata fileref="examples/enbook.xml"/>
</textobject>
    </programlisting>
</example>

<example>
<title>http://norman.walsh.name/2005/04/01/examples/chapters/chap01.xml</title>
<programlisting>
      <textobject>
<textdata fileref="examples/chapters/chap01.xml"/>
</textobject>
    </programlisting>
</example>

<para xml:id="p4">Is the book valid? Well, let's see. Entities are expanded by the
parser, so that book is structurally equivalent to this document:</para>

<informalexample>
<programlisting>
      <textobject>
<textdata fileref="examples/enbook-ex.xml"/>
</textobject>
    </programlisting>
</informalexample>

<para xml:id="p5">And that document is valid DocBook NG “IPA”. So the answer is yes.</para>

<para xml:id="p6">Next, if I tell you that the
<tag class="attribute">fileref</tag> attribute is resolved against the
current base URI, can you tell me the URI of that graphic? Did you get
<uri>http://norman.walsh.name/2005/04/01/examples/chapters/picture.png</uri>?
I knew you could. The point is that expanding an entity preserves the
base URI of that entity.</para>

<para xml:id="p7">Now let's drag this document into the
twenty-first century.</para>

<example>
<title>http://norman.walsh.name/2005/04/01/examples/xibook.xml</title>
<programlisting>
      <textobject>
<textdata fileref="examples/xibook.xml"/>
</textobject>
    </programlisting>
</example>

<para xml:id="p8">Is the book valid? No, because the DocBook NG schema doesn't allow
XInclude elements. Oh, you meant <emphasis>after</emphasis> XInclude
processing. (I'll resist a long rant about processing models
<link xlink:href="/2004/06/20/pipelines">by reference</link>.)
Well, let's see. After XInclude expansion, we'll get:</para>

<informalexample>
<programlisting>
      <textobject>
<textdata fileref="examples/xibook-ex.xml"/>
</textobject>
    </programlisting>
</informalexample>

<para xml:id="p9">So the answer is, “it depends”. Specifically, it depends on
whether or not DocBook NG “IPA” allows <tag class="attribute">xml:base</tag> (you did notice the extra <tag class="attribute">xml:base</tag> attribute, didn't you?) to appear on
<tag>chapter</tag>. It does, so the answer is yes.</para>

<para xml:id="p10">This time, I'm sure you can tell me that the URI of that graphic is
<uri>http://norman.walsh.name/2005/04/01/examples/chapters/picture.png</uri>,
because the base URI is explicit.</para>

<para xml:id="p11">The problem is that lots and lots of schemas out there, maybe
some that you're responsible for don't allow <tag class="attribute">xml:base</tag> to appear <emphasis>anywhere</emphasis>.
And XInclude is
fundamentally incompatible with all those schemas in the presence of
validation.</para>

<section xml:id="ugh">
<title>Ugh.</title>

<para xml:id="p12">In the short term, I think there's only one answer: update your
schemas to allow <tag class="attribute">xml:base</tag> either (a)
everywhere or (b) everywhere you want XInclude to be allowed. I
urge you to put it everywhere as your users are likely to want to do things
you never imagined.</para>

<para xml:id="p13">Longer term, I've heard a couple of possible solutions. One is to change
(W3C XML) schema validation so that attributes in the <code>xml:</code>
namespace are silently allowed everywhere (just like attributes from
the <uri>http://www.w3.org/2001/XMLSchema-instance</uri> namespace).
That isn't a very attractive answer to me; I think it's a flaw in W3C XML
Schema that
I can't control where the schema instance attributes can occur. But it
would allow you to use XInclude with all those schemas that you have no
power to update.</para>

<para xml:id="p14">Another possibility, though I haven't heard it suggested with any
seriousness, would be to update XInclude so that it doesn't add
<tag class="attribute">xml:base</tag> attributes to the included document.
The sad thing is, that attribute is a bit of a “belt and
suspenders” approach to the base URI. The included document has a 
base URI property in the Infoset and that property has the correct value.
We didn't have to add the attribute. Except maybe we did, because without
the attribute, the correct base URI wouldn't survive serialization (as
might occur, for example, if you packaged the document up and shipped
it off to some web service in a <link xlink:href="http://en.wikipedia.org/wiki/Simple_Object_Access_Protocol">SOAP</link>
envelope).</para>

<para xml:id="p15">In any event, even if we could remove it, removing
it would be backwards incompatible so it's awfully painful to do that.
And it won't magically fix all the running code out there. And it will
seriously inconvenience folks who need to ship documents around after
XInclude expansion.</para>

<para xml:id="p21">Updated, 04 Apr 2005.</para>

<para xml:id="p16">I originally said, I think what pains me most about
this situation is that XInclude was in development for just over
<emphasis>five years</emphasis>. It went through <emphasis>eleven
drafts</emphasis><footnote>
	<para xml:id="p17">Evolving at an average
rate of just over 37 words a day, if you count all the words in all
the public drafts.</para>
      </footnote> including
<emphasis>three</emphasis> Candidate Recommendations.</para>

<para xml:id="p18">I went on to ask why no one noticed until after
XInclude became a Recommendation. And I was just wrong. The problem
was noticed, and the decisions taken were deliberate. This is a good
thing. It means the process didn't fail in a spectacular fashion. It's
embarrassing to screw up in public, but the embarrassment at hand is
insignificant (though much more acutely personal) compared to what I
originally feared.</para>

<para xml:id="p22">The informality of this medium allows me to write
quickly, sometimes too quickly. I've said stupid things before,
I'm bound to say them again, though this incident is likely to make me
a little more careful.</para>

<para xml:id="p19">In the original version of this essay, I went on to
compare the length of XInclude (8,563 “words” according to my word
counting script) with the length of the XSL/XML Query family of
specifications (clocking in at 505,779, just over a half million).
The point of my comparison is no longer relevant, but I'll leave the
numbers.</para>

<para xml:id="p20">P.S. I am still
<link xlink:href="http://en.wikipedia.org/wiki/April_fool%27s_day">not
kidding</link>, though perhaps I wish I could say I had been.
</para>
</section>
</essay>

