Infoset Equality

Volume 7, Issue 86; 19 May 2004; last modified 08 Oct 2010

From the Technical Plenary, a URI that got lost: a quick “off-the-cuff” definition for XML chunk equality based on the Infoset.

At the W3C Technical Plenary in March, 2004, the XML Core Working Group and the TAG met to discuss the “XML chunk” issue.

Part of that discussion was about what it means for two chunks of XML to be equal. I banged up a quick “off-the-cuff” definition for equality based on the Infoset.

I wanted to make the definition available during the meeting so I dropped it into my “scratch space” on this site. That URI must have made it into some record of the meeting because it turns up in my logs occasionally. In the spirit of keeping URIs persistent, here is the definition that I proposed:

1. Document Information Items

Two document information items are equal if their [children]
properties are equal, ignoring processing instructions and comments.

2. Element Information Items

Two element information items are equal if the following properties
are equal:

  - [namespace name]
  - [local name]
  - [children]
  - [attributes]

Children are compared in order, attributes without respect to order.

3. Attribute Information Items

Two attribute information items are equal if the following properties
are equal:

  - [namespace name]
  - [local name]
  - [normalized value]

4. Character Information Items

Two character information items are equal if the following properties
are equal:

  - [character code]

5. Unparsed Entity Information Items

Two unparsed entity information items are equal if the following
properties are equal:

  - [name]
  - [system identifer]
  - [notation name]

It’s not a complete definition (there are a few more information items that would have to be considered), it was just written as an attempt to show that a definition based on the Infoset could be written. If that seems like a self-evident statement, well, all I can say is that it is sometimes useful at working group meetings to say explicitly things that are self-evident.

Comments

Check out the SQL-2003 part 14 where an infoset-based sameness for XML datatypes is being defined. However, your example shows why equality is a controversial topic. I would think that for many applications, PIs and comments need to be considered for equality, but not for others...

—Posted by Michael Rys on 26 May 2004 @ 01:20 UTC #