<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
<title>Thinking differently about XML</title><biblioid class="uri">http://norman.walsh.name/2008/08/04/aboutXML</biblioid>
<volumenum>11</volumenum>
<issuenum>55</issuenum>
<pubdate>2008-08-04T16:32:29-04:00</pubdate>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2008</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>Having an XML server at my disposal is making me think about XML
applications differently.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#MarkLogic"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#W3C"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
</info>

<para xml:id="p1">I've been writing XML applications for a long time. Arguably,
since we spelled XML
<wikipedia page="SGML">“es” “gee” “em” “el”</wikipedia>.
In those years, I'd grown to think of XML applications as being
primarily operations on some principle document: a book, a web page,
an Atom feed, what have you. (I'm not saying all XML applications are like
this, I'm just saying this is how I tended to think about them.)</para>

<para xml:id="p2">Individual documents were sometimes composed from several files
(via <wikipedia page="SGML_entity">entities</wikipedia>
or <wikipedia>XInclude</wikipedia>)
and some applications operated on a small number
of files, but there was always at least some logical sense in which there
was “the main file” and its ancillary files.</para>

<para xml:id="p3">When my application involved a potentially large number of
files, I usually massaged them into a single file and used that as
one of my small number of files. All of the many and varied sources
of information used to present essays in this weblog, for example, are
aggregated into a <link xlink:href="/knows/norman.walsh.name.rdf">honking
big RDF/XML document</link> and that document is used as an ancillary
resource when formatting the XML for each essay, the “main file”.</para>

<para xml:id="p4">One of demos I constructed to learn more about
<citetitle xlink:href="http://www.marklogic.com/product/marklogic-server.html">Mark Logic Server</citetitle>
was a “W3C Spec Explorer”.</para>

<mediaobject role="flickr">
    <!--Spec Explorer: View by Editor-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/2732801307/">
    <imagedata fileref="http://farm4.static.flickr.com/3208/2732801307_3bcaf1db95.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p5">I took all of the W3C specs and poured them
into the server then I set out to write some
<wikipedia>XQuery</wikipedia> code that would
allow me to view the specifications by date, by working group, by
editor, and through full-text search (or any combination of those
options simultaneously).</para>

<mediaobject role="flickr">
    <!--Spec Explorer: Search Results-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/2732801031/">
    <imagedata fileref="http://farm4.static.flickr.com/3082/2732801031_6228487d3e.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p6">My starting point for building the sort of faceted navigation that
I had in mind was the <link xlink:href="http://www.w3.org/2002/01/tr-automation/">RDF
metadata</link> that the W3C provides. (Not only was I interested in having
better full-text search of the specs for myself, I was also interested in
exploring RDF in the server.)</para>

<para xml:id="p7">In the course of learning how best to build this application, I posted
a question on the internal “discuss” list asking some fairly basic questions
about how to efficiently search RDF's odd serialization. One of the replies
that I got suggested (clearly, concisely, and patiently), that I was
thinking about the problem from the wrong end. I had gone out and asked
the server to give me a fairly big document and now I was trying to grub
around inside it to find stuff. Instead, I should “push the constraints to
the database”. Don't just ask the server (the database) for the file, ask
it for the actual elements I care about.</para>

<para xml:id="p8">This did two things: first, it made my searches instantaneous or so
nearly so as to make no difference. Second, it made me start to think very
differently about XML applications.</para>

<para xml:id="p9">The server's “universal index” over all the content in the database
makes it practical (often blindingly fast) to ask questions over an
enormous number of documents.</para>

<para xml:id="p10">Want all the <tag>rdf:Description</tag> elements
of type “REC”, just ask for them: <code>//rdf:Description[rdf:type/@rdf:resource="…#REC"]</code>.
That's not all of the descendants of some document, that's all of them
<emphasis>anywhere in the database</emphasis>.</para>

<para xml:id="p11">In fact, in my application, I broke the big RDF document up into
a bunch of documents, one for each <tag>rdf:Description</tag> so that
I could ask for <code>/rdf:Description</code>, rather than looking at
all descendants anywhere. Had I needed to, I could have limited the
search to a particular collection or used any of a number of other
mechanisms for making it very targeted.</para>

<para xml:id="p12">Maybe I'm just discovering something that was obvious to all of
you, but I'm now thinking of XML applications over not just a few
files, but a whole database. My world is suddenly a lot bigger which
is very cool.</para>

<para xml:id="p13">There's a cool footnote to this essay too (though I'm not
actually choosing to make it a <tag>footnote</tag>, but nevermind).
The guy doing all the patient explaining was
<link xlink:href="http://marklogic.blogspot.com/">
      <personname>
<firstname>Dave</firstname>
<surname>Kellogg</surname>
      </personname>
    </link>, our CEO. Bonus points for
a paradigm shifting answer.</para>

<para xml:id="p14">That our CEO took the time to read <emphasis>and
answer</emphasis> a mundane technical question from a newbie on an
internal list set the stage for something that was really driven home to me a
couple of weeks ago at my first ever “semi-annual kickoff meeting”: this
company is full of excellent people. Every single one of them, as far
as I can tell.</para>

<para xml:id="p15">I'm having a ball.</para>

</essay>

