xsi:type Train Wreck
I’ve never liked xsi:type. From my perspective, elements have declarations and those declarations tell you the type of an element.
A rusty nail placed near a faithful compass, will sway it from the truth, and wreck the argosy.
I’ve never liked xsi:type.
                  Whenever I think about it, my intuition whispers softly that any design
                  that requires it is somehow flawed or at least ugly.
From my perspective, elements have declarations and those declarations tell you the type of an element. I don’t care if the notion of type is explicitly part of the schema language or not.
An <address> is an
                  “address” because somewhere there’s a declaration for that element that
                  tells you what attributes and children it can have. Things that can
                  have streets and cities and postal codes are addresses so that’s what makes an
                  <address> an “address”.
Here’s an address type, for example:
<xs:complexType name="Address">
  <xs:sequence>
    <xs:element name="addressee" type="xs:string"
                minOccurs="0" maxOccurs="1"/>
    <xs:element name="street" type="xs:string"
                minOccurs="0" maxOccurs="3"/>
    <xs:element name="city" type="xs:string"/>
    <xs:element name="stateOrProvince" type="xs:string"
                minOccurs="0" maxOccurs="1"/>
    <xs:element name="postCode" type="xs:string"/>
  </xs:sequence>
</xs:complexType>And we might associate that type with a particular element using the declaration:
<xs:element name="address" type="a:Address"/>In the old days, when we used DTDs, there was a one-to-one correspondence
                  between element names and their declarations. That’s no longer true, but I
                  still think my premise holds. If there are several declarations for
                  <address>, at least one of them must apply
                  if the document is valid. (If the document isn’t valid, all bets are off anyway.)
                  If more than one declaration applies, some schema languages call that an error
                  and some don’t. If it isn’t an error, I’d probably like a warning about that
                  in my development environment, but as far as I’m concerned, the tool is free
                  to pick any one of the declarations (or all of them, if that’s appropriate in
                  the processing environment).
Now, my choice of element names in the address type belies a desire to handle
                  international addresses (if I was only concerned with U.S. addresses, I’d have
                  just named the children state and
                  zip). So let’s make
                  an international address type:
<xs:complexType name="IntlAddress">
  <xs:complexContent>
    <xs:extension base="a:Address">
        <xs:sequence>
          <xs:element name="country" type="xs:string"/>
        </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>And associate it with an element:
<xs:element name="intladdress" type="a:IntlAddress"/>With these types and declarations, I can write a stylesheet that
                  processes addresses and
                  intladdresses and I can do that with full knowledge
                  of the attributes and children each can have because I know how
                  address and intladdress are
                  declared.
Enter xsi:type.
Because an international address is defined as an extension of an address, things like this are valid:
<address xsi:type="a:IntlAddress">
  <addressee>The House of Commons</addressee>
  <city>London</city>
  <postCode>SW1A 0AA</postCode>
  <country>GB</country>
</address>What does this mean? It means that the declaration for address
                  isn’t relevant here. The type of our element has been hijacked.
Critically, it means the mental model that knowing an element’s name (in a particular context) and knowing that it’s valid tells you enough about it’s type to process it is wrong.
This XSLT template is going to silently do the wrong thing:
<xsl:template match="a:address">
  <p class="{local-name(.)}">
    <xsl:for-each select="a:addressee|a:street">
      <xsl:value-of select="."/>
      <br/>
    </xsl:for-each>
    <xsl:value-of select="a:city"/>
    <xsl:if test="a:stateOrProvince">
      <xsl:text>, </xsl:text>
      <xsl:apply-templates select="a:stateOrProvince"/>
    </xsl:if>
    <xsl:text>  </xsl:text>
    <xsl:value-of select="a:postCode"/>
  </p>
</xsl:template>XSLT 2.0 provides machinery to deal with this situation. You can, for example, match on the name of the type instead of the name of the element. But that means you’re going to need a type-aware XSLT 2.0 processor and a possibly substantial rewrite of your stylesheets.
Your other applications, in Perl,
                  Python, your language of
                  choice, are going to have to be adapted as well. I think that’s going to mean
                  explicitly checking for xsi:type or
                  getting some sort of general access to the PSVI.
This isn’t news, of course. In fact, I must have known this at some level
                  all along. But my attention was drawn to this particular side-effect
                  of xsi:type recently and it makes me
                  like xsi:type even less.
W3C XML Schema fans will probably tell me that if I just accept the W3C XML Schema type-based model of the world, upgrade my tools, and treat the type information as primary, it’ll all “just work”. What’s more, I expect they’ll tell me this is “the right way” to work with XML. Then one of them will tell me that XML is, like, you know, “object oriented.” I don’t know if that’ll make me laugh or cry, but I’ll argue that they’re wrong.
Alas, you can’t prevent the use of xsi:type
                  (W3C XML Schema implicitly declares them globally and offers no provision for
                  constraining their use). So you’ll have to support them, or use some other
                  constraint mechanism to prevent them, if you support W3C XML Schema.
Comments
Yet another reason to avoid XSD like the plague. XML development shouldn't require one to buy a thousand-dollar IDE to understand what you're doing.
"Avoid like the plague" is a bit much. I consider it a reason to discourage the use of XSD outside of the domains it was optimized for: XMLization of transaction-oriented data and/or data from relational databases. (This is a bit of a simplification, but only a bit.) I don't have much problem with
<xs:element name="quantity" type="xs:integer"/>
but when dealing with data more oriented toward publishable content, then, as Norm shows, XSD gets messier and messier.
I've been thinking for a while that a distinction between publishable content and transaction-oriented data is more useful than the traditional distinction in the XML world between "documents" and "data", since all well-formed XML is both.
I think it's definitely a question of choosing the right tool for the job, and that means known the strengths and limitations of your tools.
But your example strikes me as a little off-topic, Bob. It was never my intent to suggest it was bad to declare types. The tricky bit occurs when someone says:
<quantity xsi:type="byte">34</quantity>
in a document. And even then, it's probably harmless because byte is a restriction of integer. In fact, for simple types it may not matter very much (though I haven't thought through all the possibilities).
As a newbie I developed a lot of applications using xml as a command language. I am particulary fond of things like:
<cmd type="delete" item-id="2342"/> <cmd type="insert"/> <cmd type="edit" item-id="2342"/>
Is this a difficult structure to document using xsd?
I can not figure it out, but this article seemed to be close to the topic of elements with the same name but with a differrentating "type" attribute....
It's difficult as in impossible to have an XSD with different content models for the different 'cmd' elements based on the value of the type attribute.
DTDs can't do it either, but RELAX NG can.