<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       version="pto">
<info>
<title>XML Is Not Object Oriented</title>
<volumenum>6</volumenum>
<issuenum>26</issuenum>
<pubdate>2003-06-01</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2003</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Elements are not objects, their attributes and children are neither
fields nor methods, and content models are not related by inheritance.</para>
</abstract>
</info>
<epigraph>
<attribution>Montaigne</attribution>
<para xml:id='p1'><indexterm><primary>Montaigne</primary></indexterm>The most
universal quality is diversity.
</para>
</epigraph>

<para xml:id='p2'>Let me say right up front that I'm an object oriented<indexterm>
<primary>Programming</primary><secondary>Object Oriented</secondary></indexterm>
kinda guy.
I have faith in the paradigm, I've drunk the kool-aid. I'm absolutely
<emphasis>not</emphasis> suggesting that XML applications shouldn't be
written using an object oriented style. To the contrary, most of the
XML applications that I've written are object oriented from top to
bottom.</para>

<para xml:id='p3'>What I am saying is that the constituent elements and attributes
of an XML vocabulary are not <emphasis>generally</emphasis> related to
each other by inheritance, nor do they naturally correspond to objects
with any kind of precision.</para>

<note><title>Note</title>
<para xml:id='p4'>I'm well aware that there are many applications where there
<emphasis>is</emphasis> a natural correspondence between an object
graph and an XML serialization of that graph. And there are really
good tools like
<link xlink:href="http://java.sun.com/xml/jaxb/"><application>JAXB</application></link> for effectively and
efficiently marshaling and unmarshaling data in those cases. What
I'm saying is that it's not <emphasis>generally</emphasis> true.</para>
</note>

<para xml:id='p5'>In particular, vocabularies like DocBook that are predominantly
mixed content, are designed for semantic markup of human readable
text, and need to provide considerable flexibility for customization
by end users, should not be modeled as if there was some inheritance
relationship between the elements or as if one element was derived by
some sort of extension or restriction from some other element.</para>

<section xml:id='s1'>
<title>Elements Aren't Derived</title>

<para xml:id='p6'>A class in an object oriented language can be thought of as
defining a chunk of data that consists of a list of fields (property/value
pairs) and a list of methods for accessing and
manipulating those fields. (At this level of abstraction, we can
ignore the details of encapsulation that have to do with the
accessibility of fields and methods.)</para>

<para xml:id='p7'>When one class extends another it can add new fields and new
methods (and it may be able to redefine the function body of existing
methods), but it doesn't remove existing fields or radically alter
the internal structure of the object.
</para>

<para xml:id='p8'>The point being that a piece of code that knows how to handle an
object of class X will automatically be able to handle objects of
class X' (derived from X). There may be additional fields and methods
provided by X' that are unknown, but that won't have any impact on the
code that's only using the fields and methods defined in X.</para>

<para xml:id='p9'>In XML, things that sound like derivation often aren't. Consider
paragraphs and formal paragraphs:</para>

<programlisting><![CDATA[<para xml:id='p10'>This is a paragraph</para>

<formalpara>
<title>Paragraph Title</title>
<para xml:id='p11'>This is a formal paragraph.</para>
</formalpara>]]></programlisting>

<para xml:id='p12'>If I told you that a formal paragraph was a paragraph with a
title, you might imagine that a formal paragraph was an extension of a
<quote>normal</quote> paragraph. But closer inspection reveals that it
doesn't work that way: a formal paragraph contains a paragraph and a title
(it's an aggregate), it doesn't extend the original paragraph.</para>

<para xml:id='p13'>Similarly, you might imagine that the various sorts of lists are
all derived from some common type, but that doesn't work either. In
DocBook, for example, ordered and enumerated lists are similar but
variable lists and segmented lists bear only a vague structural
similarity.</para>

<para xml:id='p14'>What's more, even in cases where the structural similarities
would make derivation sensible to the original designers, customizers
often want to make changes that break the pattern. You might, for
example, have chosen to derive itemized and ordered lists from some
common supertype, but if a customizer wants to remove an attribute
from ordered lists, they'll be breaking the derivation.</para>

<para xml:id='p15'>It just doesn't work.</para>

</section>

<section xml:id='s2'>
<title>Content Models Aren't Inherited</title>

<para xml:id='p16'>In general, most content models in vocabularies like DocBook are
<quote>bags of stuff</quote>. There just isn't any sensible
inheritance model for them. There are almost no elements about which
you can say, <quote>oh it's just like this other element except that it
allows a couple of new elements</quote>. And when you can say that, it's
probably not the point.</para>

<para xml:id='p17'>And what I said before about customizers wanting to change things
in ways that don't suit an inheritance model you might have concocted, applies
in spades to content models. There's almost no change that someone won't
need to make. Sometimes customizers want to attack your content models with
a machette, and sometimes with a glue gun.</para>

<para xml:id='p18'>You might argue that if it was all done right, the inheritance model
would make the customizers job easy. That, in fact, it's failure to do so
that brings out the machettes and the glue guns. But I don't think that's true.</para>

<para xml:id='p19'>Customizers change content models to suit real business needs.
And those needs aren't likely to fit neatly into your design.
Especially when the customizers are adapting your vocabulary to uses
you never imagined.</para>

<para xml:id='p20'>And they will. If you let them.</para>

</section>

</essay>
