xsi:type Train Wreck

Volume 7, Issue 18; 29 Jan 2004; last modified 08 Oct 2010

I’ve never liked xsi:type. From my perspective, elements have declarations and those declarations tell you the type of an element.

A rusty nail placed near a faithful compass, will sway it from the truth, and wreck the argosy.

—Sir Walter Scott

I’ve never liked xsi:type. Whenever I think about it, my intuition whispers softly that any design that requires it is somehow flawed or at least ugly.

From my perspective, elements have declarations and those declarations tell you the type of an element. I don’t care if the notion of type is explicitly part of the schema language or not.

An <address> is an “address” because somewhere there’s a declaration for that element that tells you what attributes and children it can have. Things that can have streets and cities and postal codes are addresses so that’s what makes an <address> an “address”.

Here’s an address type, for example:

<xs:complexType name="Address">
  <xs:sequence>
    <xs:element name="addressee" type="xs:string"
                minOccurs="0" maxOccurs="1"/>
    <xs:element name="street" type="xs:string"
                minOccurs="0" maxOccurs="3"/>
    <xs:element name="city" type="xs:string"/>
    <xs:element name="stateOrProvince" type="xs:string"
                minOccurs="0" maxOccurs="1"/>
    <xs:element name="postCode" type="xs:string"/>
  </xs:sequence>
</xs:complexType>

And we might associate that type with a particular element using the declaration:

<xs:element name="address" type="a:Address"/>

In the old days, when we used DTDs, there was a one-to-one correspondence between element names and their declarations. That’s no longer true, but I still think my premise holds. If there are several declarations for <address>, at least one of them must apply if the document is valid. (If the document isn’t valid, all bets are off anyway.) If more than one declaration applies, some schema languages call that an error and some don’t. If it isn’t an error, I’d probably like a warning about that in my development environment, but as far as I’m concerned, the tool is free to pick any one of the declarations (or all of them, if that’s appropriate in the processing environment).

Now, my choice of element names in the address type belies a desire to handle international addresses (if I was only concerned with U.S. addresses, I’d have just named the children state and zip). So let’s make an international address type:

<xs:complexType name="IntlAddress">
  <xs:complexContent>
    <xs:extension base="a:Address">
        <xs:sequence>
          <xs:element name="country" type="xs:string"/>
        </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>

And associate it with an element:

<xs:element name="intladdress" type="a:IntlAddress"/>

With these types and declarations, I can write a stylesheet that processes addresses and intladdresses and I can do that with full knowledge of the attributes and children each can have because I know how address and intladdress are declared.

Enter xsi:type.

Because an international address is defined as an extension of an address, things like this are valid:

<address xsi:type="a:IntlAddress">
  <addressee>The House of Commons</addressee>
  <city>London</city>
  <postCode>SW1A 0AA</postCode>
  <country>GB</country>
</address>

What does this mean? It means that the declaration for address isn’t relevant here. The type of our element has been hijacked.

Critically, it means the mental model that knowing an element’s name (in a particular context) and knowing that it’s valid tells you enough about it’s type to process it is wrong.

This XSLT template is going to silently do the wrong thing:

<xsl:template match="a:address">
  <p class="{local-name(.)}">
    <xsl:for-each select="a:addressee|a:street">
      <xsl:value-of select="."/>
      <br/>
    </xsl:for-each>
    <xsl:value-of select="a:city"/>
    <xsl:if test="a:stateOrProvince">
      <xsl:text>, </xsl:text>
      <xsl:apply-templates select="a:stateOrProvince"/>
    </xsl:if>
    <xsl:text>&#160;&#160;</xsl:text>
    <xsl:value-of select="a:postCode"/>
  </p>
</xsl:template>

XSLT 2.0 provides machinery to deal with this situation. You can, for example, match on the name of the type instead of the name of the element. But that means you’re going to need a type-aware XSLT 2.0 processor and a possibly substantial rewrite of your stylesheets.

Your other applications, in Perl, Python, your language of choice, are going to have to be adapted as well. I think that’s going to mean explicitly checking for xsi:type or getting some sort of general access to the PSVI .

This isn’t news, of course. In fact, I must have known this at some level all along. But my attention was drawn to this particular side-effect of xsi:type recently and it makes me like xsi:type even less.

W3C XML Schema fans will probably tell me that if I just accept the W3C XML Schema type-based model of the world, upgrade my tools, and treat the type information as primary, it’ll all “just work”. What’s more, I expect they’ll tell me this is “the right way” to work with XML. Then one of them will tell me that XML is, like, you know, “object oriented.” I don’t know if that’ll make me laugh or cry, but I’ll argue that they’re wrong.

Alas, you can’t prevent the use of xsi:type (W3C XML Schema implicitly declares them globally and offers no provision for constraining their use). So you’ll have to support them, or use some other constraint mechanism to prevent them, if you support W3C XML Schema.

Comments

Yet another reason to avoid XSD like the plague. XML development shouldn't require one to buy a thousand-dollar IDE to understand what you're doing.

"Avoid like the plague" is a bit much. I consider it a reason to discourage the use of XSD outside of the domains it was optimized for: XMLization of transaction-oriented data and/or data from relational databases. (This is a bit of a simplification, but only a bit.) I don't have much problem with

<xs:element name="quantity" type="xs:integer"/>

but when dealing with data more oriented toward publishable content, then, as Norm shows, XSD gets messier and messier.

I've been thinking for a while that a distinction between publishable content and transaction-oriented data is more useful than the traditional distinction in the XML world between "documents" and "data", since all well-formed XML is both.

I think it's definitely a question of choosing the right tool for the job, and that means known the strengths and limitations of your tools.

But your example strikes me as a little off-topic, Bob. It was never my intent to suggest it was bad to declare types. The tricky bit occurs when someone says:

in a document. And even then, it's probably harmless because byte is a restriction of integer. In fact, for simple types it may not matter very much (though I haven't thought through all the possibilities).

As a newbie I developed a lot of applications using xml as a command language. I am particulary fond of things like:

Is this a difficult structure to document using xsd?

I can not figure it out, but this article seemed to be close to the topic of elements with the same name but with a differrentating "type" attribute....

It's difficult as in impossible to have an XSD with different content models for the different 'cmd' elements based on the value of the type attribute.

DTDs can't do it either, but RELAX NG can.