<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
         xmlns:dc='http://purl.org/dc/elements/1.1/'
         xmlns:gal='http://norman.walsh.name/rdf/gallery#'>
<info>
<title>xsi:type Train Wreck</title>
<volumenum>7</volumenum>
<issuenum>18</issuenum>
<pubdate>2004-01-29T07:44:00-05:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2004</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>I’ve never liked xsi:type. From my perspective, elements have
declarations and those declarations tell you the type of an element.
</para>
</abstract>
</info>

<epigraph>
<attribution><personname>
<firstname>Sir Walter</firstname><surname>Scott</surname>
</personname></attribution>
<para xml:id='p1'>A rusty nail placed near a faithful compass, will sway it from
the truth, and wreck the argosy.</para>
</epigraph>

<para xml:id='p2'>I’ve never liked <tag class="attribute">xsi:type</tag>.
Whenever I think about it, my intuition whispers softly that any design
that requires it is somehow flawed or at least ugly.</para>

<para xml:id='p3'>From my perspective, elements have declarations and those
declarations tell you the type of an element. I don’t care if the notion
of type is explicitly part of the schema language or not.</para>

<para xml:id='p4'>An <tag class="starttag">address</tag> is an
“address” because somewhere there’s a declaration for that element that
tells you what attributes and children it can have. Things that can
have streets and cities and postal codes are addresses so that’s what makes an
<tag class="starttag">address</tag> an “address”.</para>

<para xml:id='p5'>Here’s an address type, for example:</para>

<programlisting><![CDATA[<xs:complexType name="Address">
  <xs:sequence>
    <xs:element name="addressee" type="xs:string"
                minOccurs="0" maxOccurs="1"/>
    <xs:element name="street" type="xs:string"
                minOccurs="0" maxOccurs="3"/>
    <xs:element name="city" type="xs:string"/>
    <xs:element name="stateOrProvince" type="xs:string"
                minOccurs="0" maxOccurs="1"/>
    <xs:element name="postCode" type="xs:string"/>
  </xs:sequence>
</xs:complexType>]]></programlisting>

<para xml:id='p6'>And we might associate that type with a particular element using
the declaration:</para>

<programlisting><![CDATA[<xs:element name="address" type="a:Address"/>]]></programlisting>
<para xml:id='p7'>In the old days, when we used DTDs, there was a one-to-one correspondence
between element names and their declarations. That’s no longer true, but I
still think my premise holds. If there are several declarations for
<tag class="starttag">address</tag>, at least one of them must apply
if the document is valid. (If the document isn’t valid, all bets are off anyway.)
If more than one declaration applies, some schema languages call that an error
and some don’t. If it isn’t an error, I’d probably like a warning about that
in my development environment, but as far as I’m concerned, the tool is free
to pick any one of the declarations (or all of them, if that’s appropriate in
the processing environment).</para>

<!--
<para xml:id='p8'>(“Pick any one” isn’t the only way to deal
with ambiguity; there are other equally valid possibilities. But if you said
that something could be “type1” (either “red”, “green”, or “blue”) or
“type2” (either “red”, “yellow”, or “magenta”) and the document
contains a “red” one, I’d be happy if the validator picked either
“type1” or “type2”. If it mattered, your schema should have provided
some mechanism to distinguish between them. If your processing
environment can report that it matches either “type1” or “type2”, that’s
even better. But if it can only report one, it can pick one
based on the phase of the moon, as far as I’m concerned.)</para>
-->

<para xml:id='p9'>Now, my choice of element names in the address type belies a desire to handle
international addresses (if I was only concerned with U.S. addresses, I’d have
just named the children <tag class="element">state</tag> and
<tag class="element">zip</tag>). So let’s make
an international address type:</para>

<programlisting><![CDATA[<xs:complexType name="IntlAddress">
  <xs:complexContent>
    <xs:extension base="a:Address">
        <xs:sequence>
          <xs:element name="country" type="xs:string"/>
        </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>]]></programlisting>

<para xml:id='p10'>And associate it with an element:</para>

<programlisting><![CDATA[<xs:element name="intladdress" type="a:IntlAddress"/>]]></programlisting>

<para xml:id='p11'>With these types and declarations, I can write a stylesheet that
processes <tag>address</tag>es and
<tag>intladdress</tag>es and I can do that with full knowledge
of the attributes and children each can have because I know how
<tag>address</tag> and <tag>intladdress</tag> are
declared.</para>

<para xml:id='p12'>Enter <tag class="attribute">xsi:type</tag>.</para>

<para xml:id='p13'>Because an international address is defined as an extension of an address,
things like this are valid:</para>

<programlisting><![CDATA[<address xsi:type="a:IntlAddress">
  <addressee>The House of Commons</addressee>
  <city>London</city>
  <postCode>SW1A 0AA</postCode>
  <country>GB</country>
</address>]]></programlisting>

<para xml:id='p14'>What does this mean? It means that the declaration for <tag>address</tag>
isn’t relevant here. The type of our element has been hijacked.</para>

<para xml:id='p15'>Critically, it means the mental model that knowing an element’s name
(in a particular context) and knowing that it’s valid tells you enough about it’s
type to process it is <emphasis>wrong</emphasis>.</para>

<para xml:id='p16'>This XSLT template is going to silently do the wrong thing:</para>

<programlisting><![CDATA[<xsl:template match="a:address">
  <p class="{local-name(.)}">
    <xsl:for-each select="a:addressee|a:street">
      <xsl:value-of select="."/>
      <br/>
    </xsl:for-each>
    <xsl:value-of select="a:city"/>
    <xsl:if test="a:stateOrProvince">
      <xsl:text>, </xsl:text>
      <xsl:apply-templates select="a:stateOrProvince"/>
    </xsl:if>
    <xsl:text>&#160;&#160;</xsl:text>
    <xsl:value-of select="a:postCode"/>
  </p>
</xsl:template>]]></programlisting>

<para xml:id='p17'><application>XSLT 2.0</application> provides machinery to deal
with this situation. You can, for example, match on the name of the
type instead of the name of the element. But that means you’re going
to need a type-aware XSLT 2.0 processor and a possibly substantial
rewrite of your stylesheets.</para>

<para xml:id='p18'>Your other applications, in <application>Perl</application>,
<application>Python</application>, your language of
choice, are going to have to be adapted as well. I think that’s going to mean
explicitly checking for <tag class="attribute">xsi:type</tag> or
getting some sort of general access to the PSVI<indexterm>
<primary>PSVI</primary><see>Post Schema Validation Infoset</see>
</indexterm><indexterm><primary>Post Schema Validation Infoset</primary>
</indexterm>.</para>

<para xml:id='p19'>This isn’t news, of course. In fact, I must have known this at some level
all along. But my attention was drawn to this particular side-effect
of <tag class="attribute">xsi:type</tag> recently and it makes me
like <tag class="attribute">xsi:type</tag> even less.</para>

<para xml:id='p20'>W3C XML Schema fans will probably tell me that if I just accept
the W3C XML Schema type-based model of the world, upgrade my tools,
and treat the type information as primary, it’ll all “just work”.
What’s more, I expect they’ll tell me this is “the right way” to work
with XML. Then one of them will tell me that XML is, like, you know,
“object oriented.” I don’t know if that’ll make me laugh or cry, but
I’ll argue that
<link xlink:href="/2003/06/01/xmlnotoo">they’re wrong</link>.</para>

<para xml:id='p21'>Alas, you can’t prevent the use of <tag class="attribute">xsi:type</tag>
(W3C XML Schema implicitly declares them globally and offers no provision for
constraining their use). So you’ll have to support them, or use some other
constraint mechanism to prevent them, if you support W3C XML Schema.</para>

</essay>
