<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="/style/atom.xsl"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"
      xmlns:dcterms="http://purl.org/dc/terms/"
      xml:lang="EN-us">
   <title>Norman.Walsh.name</title>
   <subtitle>
Norm's musings. Make of them what you will.
</subtitle>
   <link rel="alternate" type="text/html" href="http://norman.walsh.name/"/>
   <link rel="self" href="http://norman.walsh.name/atom/whatsnew-fulltext.xml"/>
   <id>http://norman.walsh.name/atom/whatsnew.xml</id>
   <updated>2010-03-19T00:11:11Z</updated>
   <author>
      <name>Norman Walsh</name>
   </author>
   <entry>
      <title>Creating a DocBook V5.0 DTD</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/03/18/rng2dtd"/>
      <id>http://norman.walsh.name/2010/03/18/rng2dtd</id>
      <published>2010-03-18T19:38:13Z</published>
      <updated>2010-03-19T00:11:11Z</updated>
      <category term="docbook" scheme="http://technorati.com/tag/"/>
      <dc:subject>DocBook</dc:subject>
      <category term="xmlschema-xsd" scheme="http://technorati.com/tag/"/>
      <dc:subject>W3CXMLSchema</dc:subject>
      <category term="xml" scheme="http://technorati.com/tag/"/>
      <dc:subject>XML</dc:subject>
      <category term="xproc" scheme="http://technorati.com/tag/"/>
      <dc:subject>XProc</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Taking another stab at the long-standing problem of producing DTD (and XSD) versions of the DocBook V5.0 family of schemas.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/03/18/rng2dtd">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Taking another stab at the long-standing problem of producing DTD (and XSD) versions of the DocBook V5.0 family of schemas.</p>
            </div>
            <p id="p1">In the course of preparing the DocBook V5.0 schemas, I devised a process that would convert the DocBook V5.0 RELAX NG grammar into an XML DTD (we get the XSD by running <a href="http://code.google.com/p/jing-trang/" shape="rect">Trang</a> over the DTD). Closer inspection quickly reveals two flaws in this process:</p>
            <div class="orderedlist">
               <ol style="list-style: decimal;">
                  <li>
                     <p id="p2">It's incredibly brittle; while it successfully converts the base DocBook schema, it's utterly useless on even a simple customization layer.</p>
                  </li>
                  <li>
                     <p id="p3">It produces utterly crap DTDs.</p>
                  </li>
               </ol>
            </div>
            <p id="p4">The former problem is the one that's really causing me pain, though I admit I'm a little embarrassed by the second.</p>
            <p id="p5">The <a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook-publishers"
                  shape="rect">Publishers Subcommittee</a> identified an XML DTD as a requirement (Come <em>on</em> tool vendors! Get your act together. It's the twenty-first fscking century, already!). My hopes that someone else would fix the problem before I got to it went unfulfilled, so over the last few days I've turned my attention seriously to the problem. A few observations:</p>
            <div class="itemizedlist">
               <ul>
                  <li>
                     <p id="p6">The general problem may be insoluble. It may simply be that there's no algorithmic path from the simple and expressive constraints of RELAX NG to the much less expressive constraints of DTDs. Or maybe there are several paths, but the results will never look rational to a human observer. Or maybe the solution is out there, just waiting for some enterprising grad student to find it (nudge, nudge). I'm not sure. I decided that I didn't have time to look for <em>that</em> solution.</p>
                  </li>
                  <li>
                     <p id="p7">If the computer can't solve the whole problem, then we'll have to rely on human intervention to solve the hardest parts. In the DocBook family of schemas, the apparent hard parts are attribute co-constraints and multiple patterns for the same element name.</p>
                  </li>
                  <li>
                     <p id="p8">The solution in both cases is to replace the problematic patterns with a single pattern that is the union of the various options. This creates content models that are too broad: they accept all valid documents, but they also accept some invalid documents. The union of the CALS and HTML table models, for example, is a model in which a <tt class="tag-starttag">&lt;tbody&gt;</tt> element can contain a mixture of CALS <tt class="tag-starttag">&lt;row&gt;</tt> elements and HTML <tt class="tag-starttag">&lt;tr&gt;</tt> elements, among other atrocities. Such was it always with XML DTDs.</p>
                  </li>
                  <li>
                     <p id="p9">The constraints on mixed content in XML DTDs are a pain in the *ss. In a DTD, mixed content <em>must</em> be expressed as <tt class="code">(#PCDATA | a | b | c | …)*</tt>. The <tt class="code">#PCDATA</tt> token must come first, the alternatives must be at the top level (no nested parenthesis), and there must be no duplicates among the alternatives.</p>
                  </li>
                  <li>
                     <p id="p10">There are a few places where we allow extensions in “other namespaces.” The <tt class="tag-starttag">&lt;info&gt;</tt> elements can contain arbitrary additional metadata elements, for example, and equations can contain any MathML markup. DTDs and namespaces do not play well together. It might be possible to create a DTD that allowed MathML in the appropriate places, but that's more than the minimum needed to declare victory.</p>
                  </li>
                  <li>
                     <p id="p11">Patterns are a little bit like parameter entities. (Ok, a really, really little bit.) It would be nice, where possible, to represent the patterns as parameter entities in the resulting schema. At worst, it does no harm, at best it makes the DTD easier to read and may allow some small amount of customization of the DTD, not that I'd recommend that! And in any event, will solve at least a tiny part of the second problem mentioned above.</p>
                  </li>
               </ul>
            </div>
            <p id="p12">With these things in mind, I decided to adopt the following approach:</p>
            <div class="orderedlist">
               <ol style="list-style: decimal;">
                  <li>
                     <p id="p13">Create a “DTD” customization layer in RELAX NG that removes the most difficult problems: create unions for the attribute co-constraints, create unions for the elements that are defined by several patterns, remove the elements in “other namespaces” extension points, etc.</p>
                  </li>
                  <li>
                     <p id="p14">Create an “override” document for describing a few more operations. For example, removing the <tt class="code">db._phrase</tt> pattern and changing all the patterns that use it so that they use <tt class="code">db.phrase</tt> instead. (If there's an easy way to accomplish that in the RELAX NG customization layer, but it eluded me.)</p>
                  </li>
                  <li>
                     <p id="p15">Massage the modified schema until it's possible to create a DTD from it.</p>
                  </li>
               </ol>
            </div>
            <p id="p16">I'd like to say that there was some deep, theoretical insight in the last step, but there wasn't. I just built a pipeline of transformations that got me from A to B. I looked at the document, found something that wouldn't work in a DTD, wrote a transformation to remove it, and added that transformation to the pipeline. Repeat until done. When the result was accepted by an XML parser and accepted a small, valid DocBook document, I called it done.</p>
            <p id="p17">Here's a 10,000 foot summary of the process.</p>
            <div class="orderedlist">
               <ol style="list-style: decimal;">
                  <li>
                     <p id="p18">Starting with the “DTD” customization layer, perform some simplifications. Discard documentation, schematron rules, the start pattern, divisions, etc. Turn interleaves into choices; this is a little risky, but seems to be ok in the DocBook family of schemas. Extract the content of pattern definitions, producing a set of elements and a set of “parameter entities”. Drop schema facet constraints and not allowed content on the floor. Etc.</p>
                  </li>
                  <li>
                     <p id="p19">Apply the overrides, as described above.</p>
                  </li>
                  <li>
                     <p id="p20">Remove “choice” wrappers from around attributes. Fiddle with how “optional” is expressed. In RELAX NG, it's a wrapper, for the some steps in this process, it's more convenient to make it an attribute.</p>
                  </li>
                  <li>
                     <p id="p21">Remove parameter entities that are no longer referenced.</p>
                  </li>
                  <li>
                     <p id="p22">Fiddle with “optional” again, moving the optionality down to the references. (An optional reference to something is the same as a reference to an optional something.)</p>
                  </li>
                  <li>
                     <p id="p23">In subsequent steps, it's going to be convenient to be able to distinguish references to attributes from other references, so turn all <tt class="tag-starttag">&lt;ref&gt;</tt> elements that point exclusively to attributes into <tt class="tag-starttag">&lt;attref&gt;</tt> elements.</p>
                  </li>
                  <li>
                     <p id="p24">Flatten chains of references to attributes. (If A points to B points to C points to D which is an attribute, then just make A point to D.)</p>
                  </li>
                  <li>
                     <p id="p25">Fiddle with “optional” again. This time move the optionality up to the attribute declaration. This may require splitting a declaration.</p>
                  </li>
                  <li>
                     <p id="p26">Check for element names defined by more than one pattern. There better not be any.</p>
                  </li>
                  <li>
                     <p id="p27">Remove “empty” parameter entities and references to them.</p>
                  </li>
                  <li>
                     <p id="p28">Pull “text” up. Replace any reference to a parameter entity that contains <tt class="code">#PCDATA</tt> with a copy of what the parameter entity contains.</p>
                  </li>
                  <li>
                     <p id="p29">Unwrap nested “zero-or-more” elements.</p>
                  </li>
                  <li>
                     <p id="p30">Sort the parameter entities so that we never attempt to use one before it's been declared.</p>
                  </li>
                  <li>
                     <p id="p31">Convert the resulting document into a DTD. Turn parameter entities into <tt class="code">!ENTITY</tt> declarations, turn elements into <tt class="code">!ATTLIST</tt> and <tt class="code">!ELEMENT</tt> declarations, substitute DTD attribute types for the specified types, expand mixed content models, etc.</p>
                  </li>
               </ol>
            </div>
            <p id="p32">The most disappointing part of that last step is fully expanding the content model of every element that contains mixed content. I'd been working pretty hard all along to preserve as many pattern names as possible as parameter entities.</p>
            <p id="p33">The problem is that even though all the relevant parameter entities are simple lists of elements (so they could appear in a mixed content element declaration), sometimes the same element name appears in more than one pattern. So I punted, expanded them all, and removed duplicates. I still think it might be possible to do better.</p>
            <p id="p34">In retrospect, this isn't too surprising. If you go back to the DocBook V4.x DTDs and study the parameter entity structure [“masochist” -ed], you'll find a few places where we twisted the parameter entity structure pretty hard to avoid exactly this problem.</p>
            <p id="p35">In any event, the DTD that results from this process is an XML DTD that appears to validate DocBook documents. With different customization and overrides, the DTD version of the publishers schema also seems to work.</p>
            <p id="p36">I'll get it out in the next day or so for wider testing. It's very likely that there are places where it's not quite right. But it's definitely an improvement over the old process.</p>
            <div class="section">
               <h2 class="runin">Pipeline Notes </h2>
               <p class="runin" id="p37">
                  <a id="pipenotes" name="pipenotes" shape="rect"/>The process described above is not wholly unlike what I did before. One significant factor that made this attempt more successful was <a href="http://en.wikipedia.org/wiki/XML_pipeline"
                     title="Wikipedia: XML pipeline"
                     shape="rect">XProc</a>
                  <a href="/knows/what/xproc" shape="rect">
                     <img border="0" alt="[L]" src="/graphics/linkgroup.gif"/>
                  </a>. It's not impossible to chain together 14 transformations with a big XSLT 2.0 stylesheet and a bunch of modes, but it's <em>a
whole lot harder</em> to manage.</p>
               <p id="p38">Speaking of XProc, I cheated. The pipeline I'm using today will only work in XML Calabash because it relies on a compound extension step: <tt class="code">cx:until-unchanged</tt>. That step is a little bit like <tt class="code">p:for-each</tt> except that after each iteration it compares the input document to the result of applying the pipeline and repeats the process (using the output of one iteration as the input of the next) until the result is the same as the input.</p>
               <p id="p39">It's not impossible to do this without extending XProc, but it requires writing a different recursive pipeline for each looping step. It was more interesting (for me) to see how hard it would be to write a compound extension step. (So shoot me.)</p>
               <p id="p40">By the way, if you're curious, converting the base DocBook schema to a DTD is a 40 step pipeline (more or less):</p>
               <div class="screen">
                  <pre xml:space="preserve">
INFO: Running pipeline main
INFO: Running xslt rng2dtx
INFO: Running xslt override
INFO: Running cx:until-unchanged remove-choice
INFO: Running xslt attr-remove-choice
INFO: Running xslt attr-remove-choice
INFO: Running xslt attr-remove-choice
INFO: Running xslt attr-remove-choice
INFO: Running cx:until-unchanged remove-unused
INFO: Running xslt attr-remove-unused
INFO: Running xslt attr-remove-unused
INFO: Running xslt attr-remove-unused
INFO: Running xslt attr-optional-to-ref
INFO: Running cx:until-unchanged to-attref
INFO: Running xslt ref-to-attref
INFO: Running xslt ref-to-attref
INFO: Running cx:until-unchanged flatten
INFO: Running xslt flatten-attref
INFO: Running xslt flatten-attref
INFO: Running xslt flatten-attref
INFO: Running xslt attr-optional-to-decl
INFO: Running xslt multiple-gis
INFO: Running xslt remove-empty-pes
INFO: Running cx:until-unchanged pull-up
INFO: Running xslt pull-up-text
INFO: Running xslt pull-up-text
INFO: Running xslt pull-up-text
INFO: Running xslt pull-up-text
INFO: Running cx:until-unchanged unwrap
INFO: Running xslt unwrap-zeroormore
INFO: Running xslt unwrap-zeroormore
INFO: Running cx:until-unchanged sort
INFO: Running xslt sort-pe
INFO: Running xslt sort-pe
INFO: Running xslt sort-pe
INFO: Running xslt sort-pe
INFO: Running xslt sort-pe
INFO: Running xslt sort-pe
INFO: Running xslt sort-pe
INFO: Running xslt dtx2dtd
</pre>
               </div>
            </div>
            <div class="section">
               <h2 class="runin">What about a better W3C XML Schema? </h2>
               <p class="runin" id="p41">
                  <a id="xsd11" name="xsd11" shape="rect"/>The DTD that results from this conversion process is a little bit unsatisfying. It just not what a human being would do if they started from scratch. On the other hand, it doesn't matter much; there's no widespread use for DTDs beyond validation and perhaps guided authoring. (And you ought to be using RELAX NG for that, see previous comment about the twenty-first century.)</p>
               <p id="p42">The same is not true of W3C XML Schemas. There <em>would</em> (just possibly, maybe) be value in having a better XSD for DocBook. There are data binding tools and other applications that would fare much, much better with DocBook if they were given something that took proper advantage of XSD's native facilities.</p>
               <p id="p43">I've never had much interest in writing an XSD for DocBook. I'm not likely to ever be persuaded that “type inheritance” is a satisfying abstraction for how content models are related. I never felt that XSD 1.0 was a good foundation for the kind of “human prose” schemas of which DocBook is a typical example. But I'm almost convinced that XSD 1.1 has fixed some of the most inconvenient deficiencies.</p>
               <p id="p44">My bailing wire and duct tape solution for generating DTDs doesn't seem like it's ever going to be up to the task of doing the conversion properly. I'd be delighted if the aforementioned enterprising grad student built a tool to do the conversion automatically, but if not, I just might (someday) take a crack at hand authoring a proper XSD 1.1 schema for DocBook.</p>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>XProc Proposed Recommendation!</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/03/10/XProcProposedRecommendation"/>
      <id>http://norman.walsh.name/2010/03/10/XProcProposedRecommendation</id>
      <published>2010-03-10T11:01:16Z</published>
      <updated>2010-03-19T00:09:24Z</updated>
      <category term="w3c" scheme="http://technorati.com/tag/"/>
      <dc:subject>W3C</dc:subject>
      <category term="xml" scheme="http://technorati.com/tag/"/>
      <dc:subject>XML</dc:subject>
      <category term="xproc" scheme="http://technorati.com/tag/"/>
      <dc:subject>XProc</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>I'm pleased to report that <em class="citetitle">XProc: An XML Pipeline Language</em> is now a W3C Proposed Recommendation.</p>
         </div>
      </summary>
      <content type="xhtml"
               xml:base="http://norman.walsh.name/2010/03/10/XProcProposedRecommendation">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>I'm pleased to report that <em class="citetitle">XProc: An XML Pipeline Language</em> is now a W3C Proposed Recommendation.</p>
            </div>
            <p id="p1">The <a href="http://www.w3.org/2005/10/Process-20051014/tr.html#cfr" shape="rect">Proposed Recommendation</a> draft of <em class="citetitle">
                  <a href="http://www.w3.org/TR/2010/PR-xproc-20100309/" shape="rect">XProc: An XML Pipeline Language</a>
               </em> was published yesterday!</p>
            <p id="p2">I'll save a more reflective post about the process and the result for after we've crossed the last hurdle. In the meantime, here are a few useless statistics. Between start and PR:</p>
            <div class="itemizedlist">
               <ul>
                  <li>
                     <p id="p3">We had 169 meetings over 4 years and 73 days, give or take a day or two.</p>
                  </li>
                  <li>
                     <p id="p4">We produced 12 drafts: seven working drafts, two last call working drafts, two candidate recommendations, and one proposed recommendation.</p>
                  </li>
                  <li>
                     <p id="p5">I stood on the podium in front of a conference audience and asserted that we'd be finished “within a year” on at least three separate occasions. Maybe four.</p>
                  </li>
                  <li>
                     <p id="p6">We had four face-to-face meetings. <span class="personname">
                           <span class="firstname">Murray</span>
                        </span> kicked us of <a href="http://norman.walsh.name/2006/08/17/xprocwg" title="XProc WG Meeting"
                           shape="rect">in style</a> at his place and we met at three W3C Technical Plenary meetings.</p>
                  </li>
                  <li>
                     <p id="p7">The community generated two complete, interoperable implementations and a number of additional, partial implementations.</p>
                  </li>
               </ul>
            </div>
            <p id="p8">And, most important, we developed an active (and growing, I think) <a href="http://en.wikipedia.org/wiki/XML_pipeline"
                  title="Wikipedia: XML pipeline"
                  shape="rect">XProc</a>
               <a href="/knows/what/xproc" shape="rect">
                  <img border="0" alt="[L]" src="/graphics/linkgroup.gif"/>
               </a> user community. On the whole, a success by any metric, I think.</p>
            <p id="p9">Once again, I'd like to extend my congratulations and heartfelt thanks to the <a href="http://www.w3.org/2004/01/pp-impl/38398/status" shape="rect">members of the Working Group</a>, reviewers, and implementors that have helped us come this far. We couldn't have done it without you.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Wiki editing with XProc</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/03/07/wikiEdit"/>
      <id>http://norman.walsh.name/2010/03/07/wikiEdit</id>
      <published>2010-03-07T21:25:44Z</published>
      <updated>2010-03-07T22:26:26Z</updated>
      <category term="calabash" scheme="http://technorati.com/tag/"/>
      <dc:subject>Calabash</dc:subject>
      <category term="www" scheme="http://technorati.com/tag/"/>
      <dc:subject>TheWeb</dc:subject>
      <category term="xproc" scheme="http://technorati.com/tag/"/>
      <dc:subject>XProc</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>An example, for better or worse, of automating website interaction with XProc.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/03/07/wikiEdit">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>An example, for better or worse, of automating website interaction with XProc.</p>
            </div>
            <p id="p1">What happened was, the DocBook wiki broke. I don't know how or why, but it fell over. The problem, whatever it is, left the wiki immutable and the underlying database in a state of questionable consistency.</p>
            <p id="p2">Clearly a problem that had to be fixed. I setup a new wiki, running <a href="http://moinmo.in/" shape="rect">MoinMoin</a> 1.9.2 instead of <em>1.3.4</em> [Upgrade much? -ed].</p>
            <p id="p3">In theory, there's an upgrade path from 1.3.4 to 1.9.2 but I'm sufficiently unsure about the state of the current database that I'm loathe to use it. The last thing I want to do is put the <em>new</em> wiki into some indeterminate state. Instead, I grabbed all the most recent pages from the old wiki, trimmed out a bunch of cruft, and cleaned up the markup a bit (the wiki markup seems to have changed over time).</p>
            <p id="p4">What I really wanted to do was add all these pages to the new wiki. Easy enough to do with a browser for one or two pages, but several hundred pages was way more than my patience would tolerate.</p>
            <p id="p5">A quick experiment with <a href="http://www.tuffcode.com/" shape="rect">HTTP Scoop</a> made it it look pretty easy:</p>
            <div class="itemizedlist">
               <ul>
                  <li>
                     <p id="p6">Logging in sets a cookie.</p>
                  </li>
                  <li>
                     <p id="p7">Loading a page that doesn't exist provides a link that you can follow to create the page.</p>
                  </li>
                  <li>
                     <p id="p8">Following that link returns an HTML page containing a form with a place to type the wiki markup and a bunch of hidden fields.</p>
                  </li>
                  <li>
                     <p id="p9">Posting that form back to the server updates the page.</p>
                  </li>
               </ul>
            </div>
            <p id="p10">If only I had a tool that could make HTTP requests and process the results…wait, wait, I <em>have</em> one of those!</p>
            <p id="p11">XProc ought to be up to this job, yes? Yes! In fact, it was reasonably straightfoward. Wanna see how it works? Of course you do. The following pipeline works in <a href="http://xmlcalabash.com/" shape="rect">XML Calabash</a> version 0.9.20 or later.</p>
            <p id="p12">I decided to pass the wiki markup as an input and the page name as an option. From the option, I construct the value of the URI for the page.</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;p:declare-step version='1.0' xmlns:p="http://www.w3.org/ns/xproc" name="main"
                xmlns:cx="http://xmlcalabash.com/ns/extensions"
                xmlns:c="http://www.w3.org/ns/xproc-step"
                xmlns:html="http://www.w3.org/1999/xhtml"&gt;
  &lt;p:input port="source"/&gt;
  &lt;p:output port="result"/&gt;
  &lt;p:option name="page" required="true"/&gt;

  &lt;p:variable name="pageuri" select="concat('http://wiki.example.com/',$page)"/&gt;
</pre>
            </div>
            <p id="p13">Next I have to login:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
  &lt;p:www-form-urlencode match="/c:request/c:body/text()"&gt;
    &lt;p:input port="source"&gt;
      &lt;p:inline&gt;
        &lt;c:request method="POST"
                   href="http://wiki.example.com/DocBookWikiWelcome"&gt;
          &lt;c:body content-type="application/x-www-form-urlencoded"&gt;@@HERE@@&lt;/c:body&gt;
        &lt;/c:request&gt;
      &lt;/p:inline&gt;
    &lt;/p:input&gt;
    &lt;p:input port="parameters"&gt;
      &lt;p:inline&gt;
        &lt;c:param-set&gt;
          &lt;c:param name="action" value="login"/&gt;
          &lt;c:param name="name" value="NormanWalsh"/&gt;
          &lt;c:param name="password" value="MYPASSWORD"/&gt;
          &lt;c:param name="login" value="Login"/&gt;
        &lt;/c:param-set&gt;
      &lt;/p:inline&gt;
    &lt;/p:input&gt;
  &lt;/p:www-form-urlencode&gt;

  &lt;p:http-request cx:cookies="login" name="login"/&gt;

  &lt;p:sink/&gt;
</pre>
            </div>
            <p id="p14">I reverse engineered the way the login form works. I URL encode and pass my username, password, and other parameters to a <tt class="tag-starttag">&lt;p:http-request&gt;</tt> that POSTS them to the server.</p>
            <p id="p15">I don't care about the result, so I drop it on the floor with <tt class="tag-starttag">&lt;p:sink&gt;</tt>.</p>
            <p id="p16">I do care about cookies, so I have to store those somewhere. XML Calabash has an extension that lets you manage cookies in named sets. This <tt class="tag-starttag">&lt;p:http-request&gt;</tt> saves any cookies that come back in the “<tt class="literal">login</tt>” set.</p>
            <p id="p17">Next, we have to get the page we want to edit.</p>
            <div class="programlisting">
               <pre xml:space="preserve">
  &lt;p:string-replace match="/c:request/@href" cx:depends-on="login"&gt;
    &lt;p:input port="source"&gt;
      &lt;p:inline&gt;
        &lt;c:request method="GET" href="@@HERE@@"/&gt;
      &lt;/p:inline&gt;
    &lt;/p:input&gt;
    &lt;p:with-option name="replace" select="concat('&amp;quot;', $pageuri, '&amp;quot;')"/&gt;
  &lt;/p:string-replace&gt;

  &lt;p:http-request cx:cookies="login" name="getpage"/&gt;

  &lt;p:sink/&gt;
</pre>
            </div>
            <p id="p18">I use the “<tt class="literal">login</tt>” cookies so that the wiki knows who I am. I also use the <tt class="tag-attribute">cx:depends-on</tt> attribute to tell the processor that this step depends on the preceding login step, even though there's no dependency in the flow graph. Without this explicit statement about dependency, the processor might attempt to get the page before performing the login step.</p>
            <p id="p19">Once again, I don't care about the output so I drop it on the floor. In theory, I have to parse the output and find the “edit” link. In practice, I know how to create it without looking for it in the markup. I'm not even sure I have to do this step, but it is what a browser does and it was easy to do so I left it in.</p>
            <p id="p20">Now we want to get the page that includes the edit form:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
  &lt;p:string-replace match="/c:request/@href" cx:depends-on="getpage"&gt;
    &lt;p:input port="source"&gt;
      &lt;p:inline&gt;
        &lt;c:request method="GET" detailed="false" href="@@HERE@@"/&gt;
      &lt;/p:inline&gt;
    &lt;/p:input&gt;
    &lt;p:with-option name="replace" select="concat('&amp;quot;', $pageuri, '?action=edit&amp;quot;')"/&gt;
  &lt;/p:string-replace&gt;

  &lt;p:http-request cx:cookies="login" name="getpageedit"/&gt;
</pre>
            </div>
            <p id="p21">Again, we use the login cookies. And this time we don't drop the output on the floor because we have to extract the hidden fields from the page in order for our subsequent POST to work.</p>
            <div class="programlisting">
               <pre xml:space="preserve">
  &lt;p:unescape-markup namespace="http://www.w3.org/1999/xhtml"
                     content-type="text/html" name="unescape"/&gt;

  &lt;p:for-each name="for-each"&gt;
    &lt;p:iteration-source select="//html:input[@type='hidden']"/&gt;
    &lt;p:output port="result"/&gt;

    &lt;p:string-replace match="c:param/@name"&gt;
      &lt;p:input port="source"&gt;
        &lt;p:inline&gt;&lt;c:param name="name" value="value"/&gt;&lt;/p:inline&gt;
      &lt;/p:input&gt;
      &lt;p:with-option name="replace" select="concat('&amp;quot;',/*/@name,'&amp;quot;')"/&gt;
    &lt;/p:string-replace&gt;

    &lt;p:string-replace match="c:param/@value"&gt;
      &lt;p:with-option name="replace" select="concat('&amp;quot;',/*/@value, '&amp;quot;')"&gt;
        &lt;p:pipe step="for-each" port="current"/&gt;
      &lt;/p:with-option&gt;
    &lt;/p:string-replace&gt;
  &lt;/p:for-each&gt;
</pre>
            </div>
            <p id="p22">To get the hidden fields, we unescape the markup. XML Calabash uses <a href="http://home.ccil.org/~cowan/XML/tagsoup/" shape="rect">TagSoup</a> for “<tt class="literal">text/html</tt>” pages, so we'll get well-formed XML.</p>
            <p id="p23">The <tt class="tag-starttag">&lt;p:for-each&gt;</tt> loop selects each of the hidden input fields and transforms them into <tt class="tag-starttag">&lt;c:param&gt;</tt> elements. We'll need those later.</p>
            <p id="p24">Next, we have to construct the <tt class="tag-starttag">&lt;c:param&gt;</tt> for the “<tt class="literal">savetext</tt>” parameter that contains our wiki markup. This one's a bit tricky.</p>
            <div class="programlisting">
               <pre xml:space="preserve">
  &lt;p:string-replace name="savetext" match="/c:param/@value"&gt;
    &lt;p:input port="source"&gt;
      &lt;p:inline&gt;
        &lt;c:param name="savetext" value="@@HERE@@"/&gt;
      &lt;/p:inline&gt;
    &lt;/p:input&gt;
    &lt;p:with-option name="replace" select='concat("&amp;apos;",replace(c:data,"&amp;apos;","&amp;apos;&amp;apos;"),"&amp;apos;")'&gt;
      &lt;p:pipe step="main" port="source"/&gt;
    &lt;/p:with-option&gt;
  &lt;/p:string-replace&gt;
</pre>
            </div>
            <p id="p25">What the hell, I hear you ask, is up with that “<tt class="literal">replace</tt>” value?</p>
            <p id="p26">Well, see, what's going to appear on the <tt class="literal">source</tt> input port of our pipeline is a <tt class="tag-starttag">&lt;c:data&gt;</tt> element that contains the wiki markup of the page. The <tt class="option">replace</tt> option <em>is interpolated</em> as an XPath expression, so we have to “quote” the value. This is a common idiom in <tt class="tag-starttag">&lt;p:string-replace&gt;</tt>
               <sup class="footnote">[<a name="p26.6" href="#ftn.p26.6" id="p26.6" shape="rect">1</a>]</sup>. Except, in
this case, <em>the value</em> may contain both double and single quotes, so we need to make sure that they don't result in an invalid XPath expression!</p>
            <p id="p28">Imagine that this is our <tt class="tag-starttag">&lt;c:data&gt;</tt> element:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;c:data&gt;"Hello 'world'"&lt;/c:data&gt;
</pre>
            </div>
            <p id="p29">If we do the usual quoting trick, the resulting XPath expression will be:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
'"Hello 'world'"'
</pre>
            </div>
            <p id="p30">and that's not a syntactically valid XPath string value. So we use <tt class="function">replace</tt> to double-up the apostrophes. That gives us</p>
            <div class="programlisting">
               <pre xml:space="preserve">
'"Hello ''world''"'
</pre>
            </div>
            <p id="p31">which is what we want. That took me a minute or two, believe you me.</p>
            <p id="p32">Next we wrap all our <tt class="tag-starttag">&lt;c:param&gt;</tt> elements in a <tt class="tag-starttag">&lt;c:param-set&gt;</tt>, construct a <tt class="tag-starttag">&lt;c:request&gt;</tt> to hold them, and use <tt class="tag-starttag">&lt;p:www-form-urlencode&gt;</tt> to encode them.</p>
            <div class="programlisting">
               <pre xml:space="preserve">
  &lt;p:wrap-sequence name="wrap" wrapper="c:param-set"&gt;
    &lt;p:input port="source"&gt;
      &lt;p:pipe step="for-each" port="result"/&gt;
      &lt;p:pipe step="savetext" port="result"/&gt;
    &lt;/p:input&gt;
  &lt;/p:wrap-sequence&gt;

  &lt;p:string-replace match="/c:request/@href" cx:depends-on="wrap"&gt;
    &lt;p:input port="source"&gt;
      &lt;p:inline&gt;
        &lt;c:request method="POST" detailed="true" href="@@HERE@@"&gt;
          &lt;c:body content-type="application/x-www-form-urlencoded"&gt;@@HERE@@&lt;/c:body&gt;
        &lt;/c:request&gt;
      &lt;/p:inline&gt;
    &lt;/p:input&gt;
    &lt;p:with-option name="replace" select="concat('&amp;quot;', $pageuri, '&amp;quot;')"/&gt;
  &lt;/p:string-replace&gt;

  &lt;p:www-form-urlencode match="/c:request/c:body/text()"&gt;
    &lt;p:input port="parameters"&gt;
      &lt;p:pipe step="wrap" port="result"/&gt;
    &lt;/p:input&gt;
  &lt;/p:www-form-urlencode&gt;
</pre>
            </div>
            <p id="p33">Send that off to the server and we're done!</p>
            <div class="programlisting">
               <pre xml:space="preserve">
  &lt;p:http-request cx:cookies="login"/&gt;

  &lt;p:delete match="/c:response/*"/&gt;

&lt;/p:declare-step&gt;
</pre>
            </div>
            <p id="p34">I display the result, after deleting its contents, just to make sure that I got a 200 back.</p>
            <p id="p35">That little XProc script got all the pages loaded in just a couple of minutes. FTW!</p>
            <p id="p36">If you're interested, the <a href="examples/wikiedit.xpl" shape="rect">whole script</a> is available.</p>
            <div class="footnotes">
               <hr width="100" align="left" class="footnotes-divider"/>
               <div class="footnote">
                  <p id="p27">
                     <sup>[<a href="#p26.6" name="ftn.p26.6" id="ftn.p26.6" shape="rect">1</a>]</sup>So common that I regret not providing some sort of syntactic shortcut for it. Oh, well, there's always version 1.1.</p>
               </div>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Where am I?</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/03/06/where"/>
      <id>http://norman.walsh.name/2010/03/06/where</id>
      <published>2010-03-06T21:57:48Z</published>
      <updated>2010-03-07T00:17:50Z</updated>
      <dc:subject>SelfReference</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Or, perhaps more to the point, where was I? And where will I be?</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/03/06/where">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Or, perhaps more to the point, where was I? And where will I be?</p>
            </div>
            <p id="p1">I've long been fascinated by geospatial data. I do <a href="http://en.wikipedia.org/wiki/Geocaching" title="Wikipedia: Geocaching"
                  shape="rect">a little geocaching</a>
               <a href="/knows/what/Geocaching" shape="rect">
                  <img border="0" alt="[L]" src="/graphics/linkgroup.gif"/>
               </a>. I keep track of <a href="http://norman.walsh.name/2009/10/05/dominicanrepublic"
                  title="Dominican Republic"
                  shape="rect">the countries</a> I've been in. I use services like <a href="http://www.dopplr.com/" shape="rect">Dopplr</a> and <a href="http://www.tripit.com/" shape="rect">TripIt</a> to keep track of my itineraries. I carry a GPS to <a href="http://en.wikipedia.org/wiki/Geotagging" title="Wikipedia: Geotagging"
                  shape="rect">geotag photographs</a>.</p>
            <p id="p2">Carrying a mobile phone with a GPS allows me to explore the features of <a href="http://en.wikipedia.org/wiki/Geosocial_networking"
                  title="Wikipedia: Geosocial networking"
                  shape="rect">geosocial applications</a> like <a href="http://www.brightkite.com/" shape="rect">BrightKite</a>, <a href="http://www.gowalla.com/" shape="rect">Gowalla</a>, and <a href="http://www.foursquare.com/" shape="rect">Foursquare</a>. I allow Dopplr and Brightkite to update my <a href="http://fireeagle.yahoo.com/" shape="rect">Fireeagle</a> location.</p>
            <p id="p3">All very nice, but there were two obvious (to me) deficiencies in this arrangement. First, and most obvious, <em>my data</em> is spread all over <em>someone else's</em> servers. I consider this unacceptable. Any one of these services could get bought, go belly up, or <a href="http://www.wired.com/epicenter/2009/01/magnolia-suffer/" shape="rect">carelessly</a> (or <a href="http://help.yahoo.com/l/us/yahoo/geocities/close/close-03.html"
                  shape="rect">maliciously</a>, I suppose) discard all my data.</p>
            <p id="p4">The second point is less obvious: how do I tell when I'm home? I don't actually think any of you reading this would have a hard time working out where I live. I'm sure I've left enough digital clues for anyone sufficiently interested to work it out. That said, I can't actually convince myself that it's reasonable to “check in” to any of these geolocation services when I'm home. I'm not sure <a href="http://pleaserobme.com/" shape="rect">it's reasonable</a> to do when I'm not home, either, but I do. I think
I've set the services up so they only reveal my location to friends anyway.</p>
            <p id="p5">Not being able to solve the second of these problems made the data sufficiently unreliable (I've been checked into Staples for eleven days?) that the first problem hadn't crossed my “do something about it” threshold.</p>
            <p id="p6">All that changed a few days ago when <span class="personname">
                  <span class="firstname">Tom</span> 
                  <span class="surname">Morris</span>
               </span> happened to mention a clever solution in his Twitter stream. He later documented it in <a href="http://tommorris.org/blog/2010/02/22#When:19:34:55" shape="rect">a thoughtful post</a>. The basic idea is this: if you always carry your phone (or other device) on your person, then the presence of that device in your house means your home.</p>
            <p id="p7">A <a href="http://gist.github.com/311495" shape="rect">little hack</a> later and my server always knows when I'm home, give or take a few hours; it can only ping the device when it's on and not in “standby” so there's some latency. But not more than a few hours most days, I expect.</p>
            <p id="p8">Having solved the second problem, I turned my attention to the first. A few hours of hacking later and the TripIt, Foursquare, Gowalla, Brightkite, and Fireeagle APIs are giving me my data. The hardest part, honestly, was getting over the authentication hurdles. <a href="http://en.wikipedia.org/wiki/OAuth" title="Wikipedia: OAuth" shape="rect">OAuth</a> may be the right answer, but it's not painless to setup a new application.</p>
            <p id="p9">I started out by pouring all this data into <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a>. (Well, I would, wouldn't I?). A little XQuery later and I had a normalized view of all my locations. Cool.</p>
            <p id="p10">But wait, I thought, what about all those GPS tracks? Yes, those belong in there as well! Easily done.</p>
            <p id="p11">The interesting thing about GPS tracks is that you can (sometimes) interpolate data between points. I do this already when I'm geotagging photographs. By adding “next point” to the normalized data when appropriate, I could expose that in my system as well.</p>
            <p id="p12">Once that idea was in place, it was clear that an airline flight or train ride (to a lesser extent) might be subject to interpolation as well. A quick tweak to the scripts that normalize TripIt itinerary data took care of that.</p>
            <p id="p13">At the end of the day, I have an interesting (to me) personal archive of my geolocation over time. It's derived from GPS tracks, explicit checkins, and itineraries. I'm also going to integrate the GPS data that comes from photographs taken with my mobile phone. All very cool to me.</p>
            <p id="p14">I've also got a web service that I can use for geotagging photographs. I can ask, for example, where was I this morning at 10:00a?</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;point lat="42.360633" long="-72.543451"
       timestamp="2010-03-06T14:51:53Z"
       duration="PT8M7S" seconds="487"&gt;Staples&lt;/point&gt;
</pre>
            </div>
            <p id="p15">Apparently, I was at Staples and had been for 8 minutes. No wonder I'm the mayor of Staples.</p>
            <p id="p16">If I ask where I was at 2006-07-15T13:28:00Z, the answer comes from a GPS track:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;path start-lat="42.376713753" start-long="-72.516739368"
      end-lat="42.376821041" end-long="-72.516653538"
      timestamp="2006-07-15T18:27:58Z" end-timestamp="2006-07-15T18:28:03Z"
      total-distance="0.00842566695064306"
      total-duration="PT5S" total-seconds="5" velocity="6.06648020446301"
      duration="PT2S" seconds="2" distance="0.00337026678025723"
      lat="42.3768" long="-72.5167"/&gt;
</pre>
            </div>
            <p id="p17">That's a location interpolated over five seconds and about 44 feet. Seems pretty reasonable.</p>
            <p id="p18">Interpolating over airline flights is a little less precise:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;path start-lat="42.363611" start-long="-71.006111"
      end-lat="18.3375" end-long="-64.969444"
      timestamp="2010-01-28T14:05:00Z" end-timestamp="2010-01-28T18:00:00Z"
      total-distance="1697.2841796875"
      total-duration="PT3H55M" total-seconds="14100" velocity="433.349152260638"
      duration="PT55M" seconds="3300" distance="397.236722905585"
      lat="36.7753" long="-69.2873"&gt;BOS&lt;/path&gt;
</pre>
            </div>
            <p id="p19">but it's still kind of cool. Also interesting is the fact that itinerary data lets me look forward. Where will I be at 2010-03-12T08:25:00Z?</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;point lat="50.1" long="14.266667"
       timestamp="2010-03-12T08:25:00Z"
       duration="P1DT1H55M" seconds="93300"&gt;PRG&lt;/point&gt;
</pre>
            </div>
            <p id="p20">Oh yes. I think I have <a href="http://www.xmlprague.cz/2010/sessions.html#Automating-Document-Assembly-in-DocBook"
                  shape="rect">an appointment</a> there. And it'll be more accurate after I've actually checked in, taken photographs, and used my GPS. Sweet.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>What your drive knows, and what it doesn't</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/03/01/driveKnows"/>
      <id>http://norman.walsh.name/2010/03/01/driveKnows</id>
      <published>2010-03-01T20:39:11Z</published>
      <updated>2010-03-01T22:06:21Z</updated>
      <category term="laptop" scheme="http://technorati.com/tag/"/>
      <dc:subject>Laptop</dc:subject>
      <category term="osx" scheme="http://technorati.com/tag/"/>
      <dc:subject>OSX</dc:subject>
      <dc:subject>SelfReference</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>I recently had occasion to swap hard drives between two essentially identical laptops. A surprising number of apps knew the difference.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/03/01/driveKnows">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>I recently had occasion to swap hard drives between two essentially identical laptops. A surprising number of apps knew the difference.</p>
            </div>
            <p id="p1">I have two essentially identical laptops, loaded 17” MacBook Pro's. One is my personal machine and one belongs to <a href="http://www.marklogic.com/" shape="rect">Mark Logic</a>. I use my personal machine most of the time, but the fans have gotten insanely loud. I reported this as a warranty issue and got approval to take it in for service. (Yay!)</p>
            <p id="p2">Not wanting to be without my laptop for several days, I decided to be clever and swap hard drives. The <a href="http://eshop.macsales.com/installvideos/" shape="rect">install videos</a> at <a href="http://macsales.com/" shape="rect">Other World Computing</a> couldn't be more straightforward. (I had already bought the appropriate tools when I upgraded to a 500Gb drive.)</p>
            <p id="p3">Following the swap, I found a curious mixture of systems and applications that could tell.</p>
            <div class="itemizedlist">
               <ul>
                  <li>
                     <p id="p4">Not surprisingly, the iTunes authentication system could tell.</p>
                  </li>
                  <li>
                     <p id="p5">Also not surprising, Time Machine could tell. I suppose it's just possible that two folks could have absolutely identical but physically different hard drives. I really don't know how it could tell, though.</p>
                  </li>
                  <li>
                     <p id="p6">Some of the menu bar widgets were different. Apparently whatever makes the little flag icon is in flash memory somewhere and not configured from the hard drive?</p>
                  </li>
                  <li>
                     <p id="p7">The one that really surprised me was VMWare Fusion. It didn't like any of my virtual disks, it thought they were all “in use”. Stealing them back and telling VMWare I'd “moved” them was enough to recover, though, so no harm done.</p>
                  </li>
               </ul>
            </div>
            <p id="p8">My laptop is back and the drives have been switched again. I wish the problem had been fixed, but they claimed to be unable to reproduce it. (Boo!) I can tell you <em>right now</em> that it's still a problem.</p>
            <p id="p9">I should push harder to get the fans replaced, the intermittent high pitched whine is definitely a Bad Thing. Of course, the 17” MacBook Pro is due for a refresh any day now, right?</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Demo Jam at XML Prague!</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/02/23/demojam"/>
      <id>http://norman.walsh.name/2010/02/23/demojam</id>
      <published>2010-02-23T20:13:39Z</published>
      <updated>2010-02-23T20:43:39Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <category term="xml" scheme="http://technorati.com/tag/"/>
      <dc:subject>XML</dc:subject>
      <category term="xmlprague" scheme="http://technorati.com/tag/"/>
      <dc:subject>XMLPrague2010</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Demo Jam was a huge success at Balisage last year, so we're going to give it a go at XML Prague too!</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/02/23/demojam">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Demo Jam was a huge success at Balisage last year, so we're going to give it a go at XML Prague too!</p>
            </div>
            <p id="p1">If you're coming to <a href="http://www.xmlprague.cz/" shape="rect">XML Prague</a>, please plan to come to demo jam!</p>
            <p id="p2">Here's the scoop: we provide the microphone and the projector. You provide the laptop and the demo. You get five minutes (that's 300 seconds, no more) to demo anything you want, as long as you can reasonably claim that there's some XML in there somewhere. We'd love to see some XQuery and maybe even <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> (you could use the <a href="http://developer.marklogic.com/about/whatiscis.xqy#editions" shape="rect">free Community
License</a>, for example), but that's not necessary.</p>
            <p id="p3">Judging is by audience participation: if you make the crowd cheer the loudest, you win. Winner walks away with a free pass to the <a href="http://www.marklogic.com/UserConference2010/" shape="rect">Mark Logic User Conference</a>, May 4-6 in San Francisco, CA, US. Everyone gets a T-shirt.</p>
            <p id="p4">Bring your demo and your posse to <a href="http://www.xmlprague.cz/2010/xmlprague-night.html" shape="rect">the social evening</a>; demo jam starts at 8:30p. See you there!</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>XML FTW!</title>
      <link rel="alternate" type="text/html" href="http://norman.walsh.name/2010/01/25/xml"/>
      <id>http://norman.walsh.name/2010/01/25/xml</id>
      <published>2010-01-25T22:21:37Z</published>
      <updated>2010-01-25T23:12:04Z</updated>
      <dc:subject>Software</dc:subject>
      <category term="xml" scheme="http://technorati.com/tag/"/>
      <dc:subject>XML</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>On the serendipitous joy of finding XML.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/01/25/xml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>On the serendipitous joy of finding XML.</p>
            </div>
            <p id="p1">As I've <a href="http://norman.walsh.name/2009/11/01/evernote#p7" shape="rect">said</a> 
               <a href="http://norman.walsh.name/2008/12/08/whichEndIsUp#p6" shape="rect">before</a>, I'm <em>very reluctant</em> to use your application if it's a roach motel for <em>my</em> data. It would not be fair to say that I'll <em>refuse</em> to use your application, it's just a lot less likely.</p>
            <p id="p2">For example, when it came to <a href="http://norman.walsh.name/2010/01/25/gsd" title="GSD!" shape="rect">GSD</a>, I decided that open access wasn't as important as picking an application that I'd actually use. If I let myself get distracted by exploring APIs, there'd be other things not getting done! (Priorities!)</p>
            <p id="p3">Having made my bed, I figured I should see what I was lying in. Today I took a peek at how <a href="http://en.wikipedia.org/wiki/OmniFocus" title="Wikipedia: OmniFocus"
                  shape="rect">OmniFocus</a> stores data. Now, the title of this essay no doubt gives away the punch line, so consider for a moment how this would have been done in the time before XML.</p>
            <p id="p4">
               <em>…go on, have a think, I'll wait…</em>
            </p>
            <p id="p5">In my experience it would probably have been in some proprietary format, almost certainly binary, and utterly opaque. How many tools document(ed) their proprietary data formats? On some platforms, there might have been system services for storing data, some sort of platform-supported database perhaps. Those systems are (often) only marginally better. They produce, instead of an opaque stream of bits, an opaque stream of atomic values. (Don't get me wrong, I've done the reverse-engineering thing
on binary formats, I'd prefer the stream of atomic values, believe you me.)</p>
            <p id="p6">What did I find when I went looking at the OmniFocus data? A directory full of ZIP files. And what's in each ZIP file? Why <tt class="filename">contents.xml</tt>, of course!</p>
            <p id="p7">Now, it would not be fair to assert that this is perfectly transparent. XML isn't magic. There are clearly some cross-reference relationships in there that will take a little mental gymnastics to decode. But still, I'll trade this:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
...
&lt;task id="pJhk6REkEHC" op="update"&gt;
  &lt;task idref="ggQv63WgCbw"/&gt;
  &lt;added&gt;2010-01-21T16:23:08.983Z&lt;/added&gt;
  &lt;modified&gt;2010-01-24T21:01:41.632Z&lt;/modified&gt;
  &lt;name&gt;Add server-side support for multipart MIME to tests.xproc.org&lt;/name&gt;
  &lt;rank&gt;2113929216&lt;/rank&gt;
  &lt;context idref="jYnYAAVroBT"/&gt;
  &lt;due&gt;2010-01-27T22:00:00.000Z&lt;/due&gt;
  &lt;completed&gt;2010-01-24T21:01:41.622Z&lt;/completed&gt;
  &lt;order&gt;parallel&lt;/order&gt;
&lt;/task&gt;
...
</pre>
            </div>
            <p id="p8">for <em>anything</em> I would <em>ever have gotten</em> at <em>any other point</em> in the history of file formats!</p>
            <p id="p9">XML has its detractors. It would not be fair to say they are all wrong. But I'll take XML over fair any day!</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>GSD!</title>
      <link rel="alternate" type="text/html" href="http://norman.walsh.name/2010/01/25/gsd"/>
      <id>http://norman.walsh.name/2010/01/25/gsd</id>
      <published>2010-01-25T15:13:26Z</published>
      <updated>2010-01-25T19:37:50Z</updated>
      <category term="osx" scheme="http://technorati.com/tag/"/>
      <dc:subject>OSX</dc:subject>
      <dc:subject>Software</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Our engineering department has a project management philosophy they describe as GSD. I aspire to GSD.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/01/25/gsd">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Our engineering department has a project management philosophy they describe as GSD. I aspire to GSD.</p>
            </div>
            <p id="p1">For me, the part of GSD<sup class="footnote">[<a name="p1.1" href="#ftn.p1.1" id="p1.1" shape="rect">1</a>]</sup> that I most often have difficulty with is keeping track of what needs doing. My todo (or want-todo) list is absurdly long. If I feel like castigating myself, I can always find a few things on my list that <em>should</em> have been done by now. It's not that I don't work hard or get a lot done, it's that I don't always prioritize perfectly and sometimes things slip through the cracks.</p>
            <p id="p3">I've been trying to get better at this. Having an online calendar sync'd with my phone keeps me from accidentally missing meetings and phone calls, so it seems to follow that some sort of online system should be able to help me with my todo list.</p>
            <p id="p4">My requirements are pretty simple: I want something that's easy to use and I want something that syncs with my mobile device. An online tool is almost, but not quite, as good as something that I can use offline on my PDA.</p>
            <p id="p5">I don't subscribe to any particular <a href="http://en.wikipedia.org/wiki/Getting%20Things%0ADone"
                  title="Wikipedia: Getting Things Done"
                  shape="rect">Getting Things Done</a> methodology. Maybe I'll get there someday, but that's not my immediate goal.</p>
            <p id="p6">I played with <a href="http://www.rememberthemilk.com/" shape="rect">Remember The Milk</a> on-and-off last year. It seemed to work pretty well for simple lists, but I wasn't using it consistently because, I think, it wasn't quite powerful enough.</p>
            <p id="p7">This month, I took a few different systems for a test drive: <a href="http://www.2doapp.com/en/2Do/overview.html" shape="rect">2Do</a>, <a href="http://www.toodledo.com/" shape="rect">Toodledo</a>, <a href="http://culturedcode.com/things/" shape="rect">Things</a>, and <a href="http://www.omnigroup.com/applications/omnifocus/" shape="rect">OmniFocus</a>.</p>
            <p id="p8">Unfortunately, 2Do is only an iPhone app. It appears that there are plans for the next version to support syncing with Toodledo, but that doesn't exist today. Toodledo is a web-based app and is quite nice, probably plenty sufficient for my needs. On the desktop front, both <a href="http://en.wikipedia.org/wiki/Things_%28application%29"
                  title="Wikipedia: Things (application)"
                  shape="rect">Things</a> and <a href="http://en.wikipedia.org/wiki/OmniFocus" title="Wikipedia: OmniFocus"
                  shape="rect">OmniFocus</a> are probably
plenty sufficient as well. (There are no doubt other similar applications, those are just the ones I happened to try. I didn't attempt an exhaustive survey, I've GStD!)</p>
            <p id="p9">And the winner is: OmniFocus, by a narrow margin. I like the project/context duality that OmniFocus uses (ToodleDo has contexts too, if you turn them on). Mostly it boiled down to the UI: I liked the “feel” of OmniFocus best.</p>
            <p id="p10">This is an app I plan to <em>force myself to use</em>, so I figured I'd best pick one that felt good. It's also the most expensive, by a pretty wide margin, but c'est la vie.</p>
            <p id="p11">Will this really work for me? Time will tell. But so far, so good. And I'm already learning to use it in ways I hadn't planned: maintaining shopping lists and travel check lists. Those aren't the sorts of things for which I would have actively sought out software (sometimes a pencil and a piece of paper really is enough), but it's encouraging to me that I have other reasons to be paying attention to my GSD tool.</p>
            <div class="footnotes">
               <hr width="100" align="left" class="footnotes-divider"/>
               <div class="footnote">
                  <p id="p2">
                     <sup>[<a href="#p1.1" name="ftn.p1.1" id="ftn.p1.1" shape="rect">1</a>]</sup>Getting Shi␈␈␈Stuff Done!</p>
               </div>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>XML Prague 2010</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/01/24/xmlprague"/>
      <id>http://norman.walsh.name/2010/01/24/xmlprague</id>
      <published>2010-01-24T18:27:40Z</published>
      <updated>2010-01-24T18:47:00Z</updated>
      <category term="docbook" scheme="http://technorati.com/tag/"/>
      <dc:subject>DocBook</dc:subject>
      <category term="xmlprague" scheme="http://technorati.com/tag/"/>
      <dc:subject>XMLPrague2010</dc:subject>
      <category term="xproc" scheme="http://technorati.com/tag/"/>
      <dc:subject>XProc</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>See you at XML Prague! And a chance to plug some really excellent training.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/01/24/xmlprague">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>See you at XML Prague! And a chance to plug some really excellent training.</p>
            </div>
            <p id="p1">I'm delighted that my paper proposal for <a href="http://www.xmlprague.cz/2010/" shape="rect">XML Prague</a> was accepted. I'm a little less delighted that the final paper deadline was last week, but I guess that's encouragement to finish it, eh? (I will, I promise.)</p>
            <p id="p2">I'm going to speak about modular documentation in DocBook, both about the proposal for “assemblies” being developed in the <a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook"
                  shape="rect">DocBook Technical Committee</a> and about my <a href="http://xproc.org/" shape="rect">XProc</a>-based implementation.</p>
            <p id="p3">I mention these things for only two reasons<sup class="footnote">[<a name="p3.1" href="#ftn.p3.1" id="p3.1" shape="rect">1</a>]</sup>: first, to recommend that <a href="http://www.xmlprague.cz/" shape="rect">XML Prague</a> is a conference you need to go to if you're interested in XML technologies. It's that good.</p>
            <p id="p5">Second, to plug <span class="personname">
                  <span class="firstname">G. Ken</span> 
                  <span class="surname">Holman</span>
               </span>’s <a href="http://www.cranesoftwrights.com/index.html#Crane201003CZ" shape="rect">XSLT/XPath 1.0 &amp; 2.0 and XQuery 1.0 Hands-on Training</a> class. If you're looking for someone to teach you XSLT and XQuery, you'd be hard pressed to do better than Ken. And if you're interested in XML, these are technologies you <em>need</em> to know. The maximum class size is an astonishing
<em>six</em>, so it's practically 1:1. Yet another reason to be in Prague in March</p>
            <p id="p6">See you there!</p>
            <div class="footnotes">
               <hr width="100" align="left" class="footnotes-divider"/>
               <div class="footnote">
                  <p id="p4">
                     <sup>[<a href="#p3.1" name="ftn.p3.1" id="ftn.p3.1" shape="rect">1</a>]</sup>In the interest of full disclosure, I should point out that <a href="http://www.marklogic.com/" shape="rect">Mark Logic</a> is a gold sponsor of the conference and Ken's course is being delivered in partnership with our own <a href="http://www.marklogic.com/services/training.html" shape="rect">training services</a>. I don't think that makes me baised, but I guess I wouldn't, would I?</p>
               </div>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>NYMUG: Cloud deployment options</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2010/01/24/nymug"/>
      <id>http://norman.walsh.name/2010/01/24/nymug</id>
      <published>2010-01-24T18:16:47Z</published>
      <updated>2010-01-24T18:24:50Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Denise Miura, Sr. Director of Product Management will be speaking about Mark Logic's new offering for the Cloud at our upcoming User Group in New York this Wednesday.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2010/01/24/nymug">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Denise Miura, Sr. Director of Product Management will be speaking about Mark Logic's new offering for the Cloud at our upcoming User Group in New York this Wednesday.</p>
            </div>
            <p id="p1">The second Mark Logic New York User Group meeting will be held on Wednesday evening, 27 January 2010, hosted by <span class="personname">
                  <span class="firstname">Steve</span> 
                  <span class="surname">Kotrch</span>
               </span> from Simon &amp; Schuster!</p>
            <div class="variablelist">
               <dl>
                  <dt id="R.1.3.1.1">What</dt>
                  <dd>
                     <p id="p2">An opportunity to learn more about <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> and collaborate with other MarkLogic users.</p>
                  </dd>
                  <dt id="R.1.3.2.1">When</dt>
                  <dd>
                     <p id="p3">Wednesday, 27 January 2010, at 6:00pm EST.</p>
                  </dd>
                  <dt id="R.1.3.3.1">Where</dt>
                  <dd>
                     <p id="p4">
                        <a href="http://maps.google.com/maps?f=q&amp;source=s_q&amp;hl=en&amp;geocode=&amp;q=1230+Avenue+of+the+Americas,+New+York,+NY&amp;sll=37.0625,-95.677068&amp;sspn=54.005807,51.416016&amp;ie=UTF8&amp;hq=&amp;hnear=1230+Avenue+of+the+Americas,+New+York,+10020&amp;z=17"
                           shape="rect">1230 Avenue of the Americas</a>, New York, NY between 48th and 49th streets on Sixth Avenue.</p>
                  </dd>
                  <dt id="R.1.3.4.1">Who</dt>
                  <dd>
                     <p id="p5">Everyone who shows up, of course! The featured speaker this time is Denise Miura who will explain Mark Logic's new cloud deployment options and describe how they are being used today within the Mark Logic development community. She will demonstrate instantiating a MarkLogic AMI live on Amazon EC2 and talk about best practices for using MarkLogic on the EC2 platform. Finally a preview of the planned cloud-related product enhancements will be provided. This is a great opportunity to provide
feedback and influence the cloud computing initiative at Mark Logic.</p>
                  </dd>
                  <dt id="R.1.3.5.1">How</dt>
                  <dd>
                     <p id="p6">If you plan to attend, please join the <a href="http://developer.marklogic.com/mailman/listinfo/nymug" shape="rect">mailing list</a> and send your first and last name to cleo dot saab at marklogic dot com.</p>
                  </dd>
               </dl>
            </div>
            <p id="p8">If you're in New York, please stop by (and please let Cleo know if you plan to stop by).</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>XProc: Back to Last Call</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/12/28/xproc-lc"/>
      <id>http://norman.walsh.name/2009/12/28/xproc-lc</id>
      <published>2009-12-28T14:20:52Z</published>
      <updated>2009-12-28T15:24:41Z</updated>
      <category term="w3c" scheme="http://technorati.com/tag/"/>
      <dc:subject>W3C</dc:subject>
      <category term="xproc" scheme="http://technorati.com/tag/"/>
      <dc:subject>XProc</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Early in January, a new XProc draft will appear. It will be a Last Call Working Draft, a step backwards in the process, or maybe just a half-step. The reason is important though: versioning.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/12/28/xproc-lc">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Early in January, a new XProc draft will appear. It will be a Last Call Working Draft, a step backwards in the process, or maybe just a half-step. The reason is important though: versioning.</p>
            </div>
            <p id="p1">The <a href="http://www.w3.org/XML/Processing/" shape="rect">XProc WG</a> has been making steady progress on <a href="http://www.w3.org/TR/xproc/" shape="rect">XProc: An XML Pipeline Language</a>. We saw the start of wide adoption of <a href="http://en.wikipedia.org/wiki/XML_pipeline"
                  title="Wikipedia: XML pipeline"
                  shape="rect">XProc</a>
               <a href="/knows/what/xproc" shape="rect">
                  <img border="0" alt="[L]" src="/graphics/linkgroup.gif"/>
               </a> in 2009 and I think there's every reason to expect more of the same in 2010.</p>
            <p id="p2">This makes it all the more disappointing to report that we're going back to <a href="http://www.w3.org/2005/10/Process-20051014/tr#last-call" shape="rect">Last Call</a>. On a personal note, as a <a href="http://en.wikipedia.org/wiki/Technical_Architecture_Group"
                  title="Wikipedia: Technical Architecture Group"
                  shape="rect">TAG</a> alum, it's a bit embarrassing to admit why: versioning.</p>
            <p id="p3">We received significant and persuasive criticism of our <a href="http://www.w3.org/TR/2009/CR-xproc-20090528/#versioning-considerations"
                  shape="rect">versioning story</a>. In particular, we were pesuaded that requiring a processor to download the declarations for <em>V.next</em> steps in order to process them in a “forwards compatible” manner was too burdensome.</p>
            <p id="p4">In redrafting the story, we added a <tt class="tag-attribute">version</tt> attribute, “compile-time” <tt class="tag-attribute">use-when</tt> functionality <a href="http://www.w3.org/TR/xslt20/#conditional-inclusion" shape="rect">à la XSLT</a>, and extension functions for more precisely identifying the environment in which the pipeline is running.</p>
            <p id="p5">We also took the opportunity to fix decisions that were, in retrospect, mistakes, but not in-and-of themselves sufficient to motivate us to return to last call: we changed the rules for connections in option, parameter, and variable bindings so that an explicit <tt class="tag-starttag">&lt;p:empty&gt;</tt> isn't required when there's no default readable port, and we changed the rules for parameter input ports so that a binding isn't required when there's at least one explicit <tt class="tag-starttag">&lt;p:with-param&gt;</tt>. Users are already thanking us.</p>
            <p id="p6">I didn't get the document through the publication process before the end-of-year publishing moratorium, but you can read <a href="http://www.w3.org/XML/XProc/docs/WD-xproc-20091222/" shape="rect">the staged draft</a>, if you wish.</p>
            <p id="p7">Will this really be our last Last Call? I sincerely hope so! I also hope that we can move directly from Last Call to Proposed Recommendation without the formality of another Candidate Recommendation phase in between. There's precedent for doing this, and we've got active implementors.</p>
            <p id="p8">I'm sometimes frustrated by how long the process takes, but I console myself with the observation that the language is better for our efforts. Rising usage suggests the early adopters, at least, agree with us.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>David Alfred Walsh</title>
      <link rel="alternate" type="text/html" href="http://norman.walsh.name/2009/12/26/dad"/>
      <id>http://norman.walsh.name/2009/12/26/dad</id>
      <published>2009-12-26T20:38:58Z</published>
      <updated>2009-12-31T14:40:33Z</updated>
      <dc:subject>People</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>9 June 1923 — 26 November 2009.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/12/26/dad">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>9 June 1923 — 26 November 2009.</p>
            </div>
            <div class="epigraph">
               <p id="p2">A man may by custom fortify himself against pain, shame, and suchlike accidents; but as to death, we can experience it but once, and are all apprentices when we come to it.</p>
               <div class="attribution">
                  <span class="mdash">—</span>
                  <span class="personname">
                     <span class="surname">Montaigne</span>
                  </span>
               </div>
            </div>
            <p id="p1">My father was born in 1923 in Babylon, NY.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 331px">
                     <a href="http://www.flickr.com/photos/ndw/4193489909/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2556/4193489909_9968b98dc2.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 141px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>David Walsh, age 10</h3>
               </div>
            </div>
            <p id="p3">He survived the <a href="http://en.wikipedia.org/wiki/Great%20Depression"
                  title="Wikipedia: Great Depression"
                  shape="rect">Great Depression</a>. An enormous tree blew over next to him as he walked home through <a href="http://en.wikipedia.org/wiki/New_England_Hurricane_of_1938"
                  title="Wikipedia: New England Hurricane of 1938"
                  shape="rect">The Great Hurricane of 1938</a>; he walked away without a scratch. The <a href="http://en.wikipedia.org/wiki/Glider_infantry"
                  title="Wikipedia: Glider infantry"
                  shape="rect">glider born infantry</a>
took him to the <a href="http://en.wikipedia.org/wiki/China_Burma_India_Theater_of_World_War_II"
                  title="Wikipedia: China Burma India Theater of World War II"
                  shape="rect">China-Burma-India</a> theater in WWII.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 315px">
                     <a href="http://www.flickr.com/photos/ndw/4193497853/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2678/4193497853_475bc45843.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 133px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Bombay c. 1945</h3>
                  <div class="description">
                     <p>Left to right: Charles Kuhn, York PA; Bill Bride, Beacon NY; Eddy Evans, Boston MA; David Walsh, Babylon, NY</p>
                  </div>
               </div>
            </div>
            <p id="p4">Shrapnel chipped a tooth, but he survived that too. After the war he went to Alaska.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4193493117/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2595/4193493117_a49982cafe.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Browerville from the tundra, Mar 1960</h3>
               </div>
            </div>
            <p id="p5">My dad taught in <a href="http://maps.google.com/maps?f=q&amp;source=s_q&amp;hl=en&amp;geocode=&amp;q=barrow,+ak&amp;sll=64.501111,-165.406389&amp;sspn=32.580803,52.119141&amp;ie=UTF8&amp;hq=&amp;hnear=Barrow,+North+Slope,+Alaska&amp;ll=63.194018,-157.587891&amp;spn=34.054271,52.119141&amp;z=4"
                  shape="rect">Barrow</a> and <a href="http://maps.google.com/maps?f=q&amp;source=s_q&amp;hl=en&amp;geocode=&amp;q=nome,+ak&amp;sll=64.997939,-155.478516&amp;sspn=32.028433,52.119141&amp;ie=UTF8&amp;hq=&amp;hnear=Nome,+Alaska&amp;ll=64.501111,-165.406389&amp;spn=32.580803,52.119141&amp;z=4"
                  shape="rect">Nome</a>. After putting out a chimney fire, he walked away from a two story fall off a frozen roof by the lucky stroke of landing feet-first on an oil drum.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4193491117/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2671/4193491117_bf16b02b6b.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Camping in Alaska, c. 1960</h3>
               </div>
            </div>
            <p id="p6">He single-handedly built a one-room cabin on a ¼ acre plot in Fairbanks. (I think I remember seeing once a photo showing the scaffolding he built to get the roof beam in place.) He worked for the <a href="http://en.wikipedia.org/wiki/United_States_Fish_and_Wildlife_Service"
                  title="Wikipedia: United States Fish and Wildlife Service"
                  shape="rect">Fish and Wildlife Service</a> in the summers.</p>
            <p id="p7">He used to practice orienteering by walking into the Alaskan wilderness on a compass bearing and then walking back out again. On one occasion he stumbled across a downed single-engine plane containing the skeleton of its pilot. His boss laughed when my dad offered to lead a team back to the crash, assuring him that he'd never find it again. Dad's boss was right. There is <em>a lot</em> of wilderness out there.</p>
            <p id="p8">On another occasion, my dad shot a caribou only to discover as he prepared to dress it that he'd left his knife back in the jeep. Leaning his rifle against a tree, he walked back and got his knife. An enormous brown bear greeted his return by standing on its hind legs and roaring. The bear got the caribou. And the rifle. And the knife, dropped during a hasty retreat.</p>
            <p id="p9">That wasn't the only caribou that nearly got him killed; on another occasion, one attempted, unsuccessfully, to jump over his jeep. He woke on the side of the road with a caribou hoof protruding into the cab and a nasty gash on his head.</p>
            <p id="p10">I'm lucky to be here.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 333px">
                     <a href="http://www.flickr.com/photos/ndw/4194262042/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2743/4194262042_47c8e66da1.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 142px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Sleeping, June 1970</h3>
               </div>
            </div>
            <p id="p11">When my dad left Alaska, he gave the keys to his cabin to a friend. Those keys passed from friend to friend for more than twenty years. In the eighties, the current occupant persuaded my dad to let him buy the cabin. My father signed the deed and mailed it, asking the occupant to please mail the check back. The check came back a couple of weeks later. And it cleared. Luck of the Irish, or something.</p>
            <p id="p12">From Alaska, my dad traveled to Australia. My mom and dad met in Tasmania. They married in 1961.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 357px">
                     <a href="http://www.flickr.com/photos/ndw/4194255572/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2678/4194255572_933c2385e9.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 154px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Mom and dad, August 1961</h3>
               </div>
            </div>
            <p id="p13">I came along a few years later.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4193489085/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2739/4193489085_feb699e381.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Mom, dad, and I</h3>
                  <div class="description">
                     <p>June 1968</p>
                  </div>
               </div>
            </div>
            <p id="p14">I remember my dad singing sea shanties when I was a small boy.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4194245458/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm5.static.flickr.com/4046/4194245458_5db392377f.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Playing guitar</h3>
                  <div class="description">
                     <p>Time and place unknown</p>
                  </div>
               </div>
            </div>
            <p id="p15">Dad was a naturalist, hunter, trapper, fisherman, scientist, teacher, draftsman, and surveyor. He made beautiful wood carvings. He tied knots. At one time or another, <a href="http://en.wikipedia.org/wiki/List_of_knots"
                  title="Wikipedia: List of knots"
                  shape="rect">all of them</a>. I have his leather working tools. The old sewing machine on which he made sleeping bags, tents, parkas, rain slickers, and bicycle paniers got lost somewhere along the way. He built two boats.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 335px">
                     <a href="http://www.flickr.com/photos/ndw/4194244816/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2517/4194244816_68b9aa6e45.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 128px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a> 
                     <a href="http://maps.google.com/maps?ll=52.8406583333333,1.25690833333333&amp;z=16&amp;t=k"
                        shape="rect">
                        <img border="0" alt="[Google maps]" src="/graphics/map.png"/>
                     </a>
                  </div>
                  <h3>David Walsh, Sep 2009</h3>
               </div>
            </div>
            <p id="p16">After 86 years, entropy won. Entropy always wins. My dad taught me that. And the first and third <a href="http://en.wikipedia.org/wiki/Laws_of_thermodynamics"
                  title="Wikipedia: Laws of thermodynamics"
                  shape="rect">laws</a> as well.</p>
            <p id="p17">My father died in 2009 in Norwich, England.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4155535952/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2766/4155535952_b95b328e86.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>'tis himself</h3>
                  <div class="description">
                     <p>David Alfred Walsh 9 June 1923 - 26 November 2009</p>
                  </div>
               </div>
            </div>
            <p id="p18">Goodbye, dad.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>NYMUG Summary</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/11/12/nymug"/>
      <id>http://norman.walsh.name/2009/11/12/nymug</id>
      <published>2009-11-12T16:16:03Z</published>
      <updated>2009-11-12T21:56:04Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Last night, I spoke at the inaugural New York Mark Logic User Group meeting. I think it was a crowd pleaser, or at least, the punchline at the end was.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/11/12/nymug">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Last night, I spoke at the inaugural New York Mark Logic User Group meeting. I think it was a crowd pleaser, or at least, the punchline at the end was.</p>
            </div>
            <p id="p1">The real purpose of a user group is to bring together <em>users</em> (and prospective users). There was much lively discussion after my presentation, which I won't attempt to recapitulate here. The next NYMUG meeting will (most likely) be sometime in January, so please plan to come. I'll post more concrete details here when they're available, and we'll send them to the <a href="http://developer.marklogic.com/mailman/listinfo/nymug" shape="rect">mailing list</a>, of course.</p>
            <p id="p2">My biggest problem in preparing for speaking events is figuring out what to talk about, and then figuring out how to stretch that topic to fit in the time allotted.</p>
            <p id="p3">When I was suggested as the speaker for our inaugural New York meeting, I had to figure out what to talk about. I quickly thought of a topic, but never imagined that it'd be possible. To my delight and surprise, when I shopped the idea around engineering and marketing, there was universal support for the idea. So after I picked my jaw up off the floor, I had to turn my attention to stretching the topic.</p>
            <p id="p4">The topic I had in mind would easily fit on a couple of slides. That would make for a pretty short talk. At a user group meeting, maybe that wouldn't be all bad, but I felt I needed to say a bit more. I stretched things out by talking about three new and cool features that I thought some folks might not have seen or used yet.</p>
            <div class="section">
               <h2 class="runin">Support for https: and URI rewriting </h2>
               <p class="runin" id="p5">
                  <a id="https" name="https" shape="rect"/>Support for <tt class="systemitem">https:</tt> is pretty self-explanatory. Lots of sites have private information (user profiles and passwords, for example) that <em>should not</em> (many would say “must not”) be sent over an insecure communications channel. Furthermore, most users have been trained to look for a secure connection before sending credit card or other financial information over the web.</p>
               <p id="p6">Support for URI rewriting allows application authors to make cleaner interfaces. I'm a fan of clean URI interfaces. Call me picky, but think it's a lot better to expose the 4th slide in your presentation as <tt class="uri">http://example.com/slides/nymug/4</tt> than as <tt class="uri">http://example.com/slides.xqy?deck=nymug.xml&amp;foil=4&amp;format=html</tt>.</p>
               <p id="p7">Until recently, if you wanted to do this with a web site built on top of MarkLogic Server, you had to put up a proxy of some sort (often an Apache web server) to provide <tt class="systemitem">https:</tt> support and URI rewriting.</p>
               <p id="p8">Starting with MarkLogic Server V4.1, this is no longer necessary. MarkLogic server now supports https (with your own certificate or a generated, self-signed one) out of the box. The server also supports URI rewriting by allowing you to designate an arbitrary query module to rewrite URIs. Here's the example I used in the presentation:</p>
               <div class="programlisting">
                  <pre xml:space="preserve">
xquery version "1.0-ml";

declare variable $url as xs:string
        := xdmp:get-request-url();

if (matches($url, "^/slides/([^/]+)/([0-9]+)$"))
then
  replace($url,
          "^/slides/([^/]+)/([0-9]+)$",
          "/slides.xqy?deck=$1.xml&amp;amp;foil=$2")
else
  $url
</pre>
               </div>
               <p id="p9">It's an incomplete, toy example taken from the real code I used on the server behind my presentation, but it gives you a flavor for how it works. Your module starts with the URL that that was used (and access to the headers and other parts of the request), performs any sort of computation that you'd like, and returns the new URI. The new URI then goes into the server and is processed normally.</p>
               <p id="p10">There may still be good reasons to put a proxy in front of MarkLogic Server (load balancing, etc.), but you no longer have to just to satisfy requests for these two common and simple features.</p>
            </div>
            <div class="section">
               <h2 class="runin">Office Toolkits </h2>
               <p class="runin" id="p11">
                  <a id="toolkits" name="toolkits" shape="rect"/>For reasons that will become clear later on (and not only because I find “office applications” to be an inefficient, frustrating, pointless time-suck), I wanted to present this presentation using ordinary web technologies. (Specifically, HTML+CSS+JavaScript served up by MarkLogic Server.)</p>
               <p id="p12">At the same time, because I was going to talk about new server features, I was required to present a disclaimer:</p>
               <div class="admonition" id="disclaimer">
                  <table border="0" cellspacing="0" cellpadding="4"
                         summary="Presentation of a admonition">
                     <tbody>
                        <tr>
                           <td valign="top" rowspan="1" colspan="1">
                              <span class="admon-graphic">
                                 <img alt="Important" src="/graphics/important.png"/>
                              </span>
                           </td>
                           <td rowspan="1" colspan="1">
                              <div class="admon-title-text">Disclaimer</div>
                              <div class="admon-text">
                                 <p id="p13">All statements describing future releases, estimated release dates and content are plans only, and Mark Logic is under no obligation to develop, include or make available, commercially or otherwise, any specific feature or functionality in any Mark Logic product.</p>
                                 <p id="p14">Information is provided for general understanding and informational purposes only, and is subject to change at the sole discretion of Mark Logic in response to changing customer requirements, market conditions, delivery schedules and other factors.</p>
                              </div>
                           </td>
                        </tr>
                     </tbody>
                  </table>
               </div>
               <p id="p15">(A disclaimer that applies as much to this weblog essay as it did to my presentation last night, I might add.)</p>
               <p id="p16">Trouble is, this slide was sent to me in Powerpoint. To use it, I'd have to switch to Powerpoint for the rest of my presentation (yuck!), copy and paste the text (where's the fun in that?), or find some way to incorporate the slide into my DocBook-based slide deck (now that sounds interesting).</p>
               <p id="p17">Luckily, one of our engineers, <span class="personname">
                     <span class="firstname">Pete</span> 
                     <span class="surname">Aven</span>
                  </span> has already done all the heavy lifting. Pete's the primary developer of Mark Logic's open source toolkits for Word, Excel, and Powerpoint. Each toolkit provides a pipeline for ingesting office documents into MarkLogic server, an office plugin for using the server from the application, and some XQuery code to work wtih the files in the server.</p>
               <p id="p18">With that framework in place, it was pretty easy to write a little bit of XQuery code that would extract a slide from a deck and transform it into DocBook. (That's about 20 lines of code, nothing fancy, it extracts paragraphs and bulleted lists from slides, no more, no less.)</p>
               <p id="p19">The source for my final presentation looks like this:</p>
               <div class="programlisting">
                  <pre xml:space="preserve">
&lt;slides xmlns="http://docbook.org.ns/docbook"&gt;
  &lt;info&gt;
    &lt;title&gt;Transforming XML Development &lt;?lb?&gt;with MarkLogic&lt;/title&gt;
    …
  &lt;/info&gt;
  &lt;foilgroup&gt;

    &lt;foil&gt;
      &lt;title&gt;NYMUG!&lt;/title&gt;
      …
    &lt;/foil&gt;

    &lt;foil pptx="/pptx/disclaimer" foil="1"/&gt;

    …
  &lt;/foilgroup&gt;
&lt;/slides&gt;
</pre>
               </div>
               <p id="p20">a straightforward mixture of hand-authored slides and references to slides from a couple of Powerpoint decks. For the presentation, I edited a Powerpoint slide and ran it back through the process in real time, but that doesn't translate very well to a weblog essay.</p>
            </div>
            <div class="section">
               <h2 class="runin">Presentation </h2>
               <p class="runin" id="p21">
                  <a id="presentation" name="presentation" shape="rect"/>That left just the last part of my talk, “the big reveal” as it were. Having set this all up so that I can author in DocBook, even including Powerpoint slides, serve it up over <tt class="systemitem">https:</tt>, and use nice looking URIs, I still have to go the last mile and get the content into the browser.</p>
               <p id="p22">A couple of obvious ways present themselves. I could serve it up as XML with a stylesheet and let the browser do the work. Could do, but I didn't. I could translate the DocBook markup into (X)HTML using XQuery in MarkLogic Server. Could do, but I didn't.</p>
               <p id="p23">What I really want, but haven't been able to do, is to transform the DocBook in MarkLogic Server with XSLT. And <tt class="tag-starttag">&lt;cue&gt;</tt>drum roll<tt class="tag-endtag">&lt;/cue&gt;</tt> … I can haz! <tt class="tag-starttag">&lt;cue&gt;</tt>cymbal crash<tt class="tag-endtag">&lt;/cue&gt;</tt>
               </p>
               <div class="programlisting">
                  <pre xml:space="preserve">
let $doc := xdmp:document-get(concat($ROOT, $xml))/*
let $expanded := local:expand-powerpoint($doc)
let $map := map:map()
let $put := map:put($map, ...("dbp:foil")), $foil)
let $put := map:put($map, ...("dbp:deck")), $deck)
let $put := map:put($map, ...(xs:QName("dbp:format")),
                    $format)
return
  xdmp:xslt-invoke($xslt, $expanded, $map)
</pre>
               </div>
               <p id="p24">Running an internal build, I can demonstrate support for XSLT 2.0 in MarkLogic Server. (Go back and read that disclaimer again now.)</p>
               <p id="p25">There was some rejoicing at the user group meeting, I hope there's some rejoicing out there over the intertubes too. I'm certainly giddy with delight over the prospects of high-performance XSLT processing in MarkLogic Server V.future.</p>
               <p id="p26">What more can I say. When you've done your best trick, it's time to get off the stage.</p>
               <p id="p27">Thanks again to <span class="personname">
                     <span class="firstname">Steve</span> 
                     <span class="surname">Kotrch</span>
                  </span> and Simon &amp; Schuster for hosting. Hope to see you all in January!</p>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>NYMUG: New York Mark Logic Users Group!</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/11/04/nymug"/>
      <id>http://norman.walsh.name/2009/11/04/nymug</id>
      <published>2009-11-04T19:32:47Z</published>
      <updated>2009-11-05T20:49:36Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>The inaugural meeting of the New York Mark Logic User Group will take place on Wednesday evening, 11 November 2009.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/11/04/nymug">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>The inaugural meeting of the New York Mark Logic User Group will take place on Wednesday evening, 11 November 2009.</p>
            </div>
            <p id="p1">Please come join me at the inaugural meeting of the Mark Logic New York User Group on Wednesday evening, 11 November 2009, hosted by <span class="personname">
                  <span class="firstname">Steve</span> 
                  <span class="surname">Kotrch</span>
               </span> from Simon &amp; Schuster!</p>
            <div class="variablelist">
               <dl>
                  <dt>What</dt>
                  <dd>
                     <p id="p2">An opportunity to learn more about <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> and collaborate with other MarkLogic users. Pizza and soft drinks will be provided and there will be drawings for prizes!</p>
                  </dd>
                  <dt>When</dt>
                  <dd>
                     <p id="p3">Wednesday, 11 November 2009, at 6:30pm EST.</p>
                  </dd>
                  <dt>Where</dt>
                  <dd>
                     <p id="p4">
                        <a href="http://maps.google.com/maps?f=q&amp;source=s_q&amp;hl=en&amp;geocode=&amp;q=1230+Avenue+of+the+Americas,+New+York,+NY&amp;sll=37.0625,-95.677068&amp;sspn=54.005807,51.416016&amp;ie=UTF8&amp;hq=&amp;hnear=1230+Avenue+of+the+Americas,+New+York,+10020&amp;z=17"
                           shape="rect">1230 Avenue of the Americas</a>, New York, NY between 48th and 49th streets on Sixth Avenue.</p>
                  </dd>
                  <dt>Who</dt>
                  <dd>
                     <p id="p5">Everyone who shows up, of course! For better or worse, I'm the headline speaker, if that's what you meant.</p>
                  </dd>
                  <dt>How</dt>
                  <dd>
                     <p id="p6">If you plan to attend, please join the <a href="http://developer.marklogic.com/mailman/listinfo/nymug" shape="rect">mailing list</a> and send your first and last name to cleo dot saab at marklogic dot com.</p>
                  </dd>
               </dl>
            </div>
            <p id="p7">I'd say more about what I'm going to say if I'd figured out more of what I'm going to say. The title of my presentation is <em class="citetitle">Transforming XML Development with MarkLogic</em>. I think it'll be a fairly interactive show. Without any spoilers, I'll admit to having some DocBook, <a href="http://norman.walsh.name/2009/11/04/docbook50" title="DocBook V5.0"
                  shape="rect">5.0 of course</a>, some Powerpoint, some HTML, and I'm pulling them together in some pretty damn cool ways, if I do say so
myself.</p>
            <p id="p8">If you're in New York, please stop by (and please let Cleo know if you plan to stop by).</p>
            <p id="p9">See you there!</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>DocBook V5.0</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/11/04/docbook50"/>
      <id>http://norman.walsh.name/2009/11/04/docbook50</id>
      <published>2009-11-04T18:55:29Z</published>
      <updated>2009-11-04T19:27:13Z</updated>
      <category term="docbook" scheme="http://technorati.com/tag/"/>
      <dc:subject>DocBook</dc:subject>
      <category term="oasis" scheme="http://technorati.com/tag/"/>
      <dc:subject>OASIS</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>DocBook V5.0 is an OASIS Standard!</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/11/04/docbook50">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>DocBook V5.0 is an OASIS Standard!</p>
            </div>
            <p id="p1">I <a href="http://norman.walsh.name/2009/10/19/docbook50"
                  title="Call for Vote - DocBook V5.0"
                  shape="rect">reported</a> a couple of weeks ago that the ballot to make DocBook V5.0 an OASIS Standard was open. I can now report that the ballot is closed, it closed on 31 October.</p>
            <p id="p2">To my great satisfaction, I can also report that we easily cleared the <a href="http://www.oasis-open.org/committees/process-2009-07-30.php#OASISstandard"
                  shape="rect">process hurdles</a>.</p>
            <p id="p3">DocBook V5.0 is <a href="http://lists.oasis-open.org/archives/docbook-tc/200911/msg00001.html"
                  shape="rect">officially</a> an OASIS Standard.</p>
            <p id="p4">I'd like to thank everyone who took the time to help us get here: users, reviewers, members of the Technical Committee (past and present), and the OASIS members who voted for us, of course.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Evernote</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/11/01/evernote"/>
      <id>http://norman.walsh.name/2009/11/01/evernote</id>
      <published>2009-11-01T21:51:13Z</published>
      <updated>2009-11-02T13:58:39Z</updated>
      <category term="evernote" scheme="http://technorati.com/tag/"/>
      <dc:subject>Evernote</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>With a scanner and some Python, I'm an enthusiastic convert to Evernote.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/11/01/evernote">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>With a scanner and some Python, I'm an enthusiastic convert to Evernote.</p>
            </div>
            <p id="p1">I first tried <a href="http://www.evernote.com/" shape="rect">Evernote</a> more than a year ago. It seemed interesting, especially the ability to extract text from uploaded notes, even handwriting. It's not performing traditional OCR, but it does build a list of likely words in each note. That's enough to make search quite useful.</p>
            <p id="p2">Here we see a match for “America” in my barely legible scrawl captured by <a href="http://www.jotnot.com/" shape="rect">JotNot</a>:</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4067864012/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2631/4067864012_e9601362f9_d.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Evernote search</h3>
               </div>
            </div>
            <p id="p3">Even though this all seems pretty cool, I never really used it very much. Part of the problem was that I didn't seem to be putting very much into it. If you've only got six notes, you don't need folders or tagging or clever search to find them.</p>
            <p id="p4">A few weeks ago, <span class="personname">
                  <span class="firstname">Michael</span> 
                  <span class="surname">Mealling</span>
               </span> 
               <a href="http://twitter.com/mmealling/status/4844492645" shape="rect">pointed out</a> that one way to fix the data problem was to scan documents directly into Evernote. <em>That</em> turned out to be an <em>excellent</em> suggestion.</p>
            <p id="p5">For example, I've had a manilla folder of articles torn from magazines in a drawer in my desk for ages. The threshold for getting torn out and shoved in that folder was pretty high because, let's be honest, if I put too many things in that folder, I'll neither remember what's there nor be able to find it if I do remember.</p>
            <p id="p6">Twenty minutes with a scanner and suddenly I've got something that I can tag, categorize, and search painlessly. What's more it's something I <em>have with me</em> on my laptop or my phone or anywhere I can get to a web browser: it's not in a folder in a drawer thousands of miles away (as my desk is this week, for example).</p>
            <p id="p7">Of course, then I had a <em>much bigger</em> problem. I am very, very reluctant to put my data in your application if I can't get it back again. Repeat after me: no roach motels. I wish Evernote every success in the world (I'm even doing my part, to the tune of $45/year, to keep them around), but it's not hard to imagine a future in which I've come to rely on them as a repository of important information only to discover some Thursday afternoon that they've gone away or <a href="http://www.wired.com/epicenter/2009/01/magnolia-suffer/" shape="rect">lost all my data</a> or otherwise left me high and dry.</p>
            <p id="p8">To their credit and my relief, they have an API. It's not the sort of RESTful Web API I've come to expect from these sorts of services, but that's ok. It's published and documented and claims to support Java, Perl, PHP, Python and Ruby out of the box.</p>
            <p id="p9">A few short hours of hacking and a few hundred lines of Python and I had a backup tool that gets back everything I put into Evernote <em>and</em> an XML representation of the search data. What more could I ask for?</p>
            <p id="p10">(If you're interested in trying <a href="examples/backup-evernote.py" shape="rect">my backup script</a>, you'll need to go through Evernote support to get your own API key and configure a few lines to indicate where you want the files stored, but after that it should work for you too. YMMV, of course.)</p>
            <p id="p11">My script creates an XML representation (natch!) of the details returned by the Evernote APIs. Eventually, I'll probably decide to fix things so it doesn't create a single potentially enormous file, but it doesn't much matter to me at the moment because I turn around and drop this into the <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> instance that I use for my local PIM data.</p>
            <p id="p12">Now, instead of collecting only the <em>very most</em> interesting articles I see and losing them in a folder, I collect anything that interests me even slightly, confident in the knowledge that I'll always be able to find it, and everything else, with ease. Alongside those articles, you'll find scanned business cards, recipes, specifications, notes, photographs of whiteboards and napkins, receipts, and a host of other interesting words and ideas that I've captured as the universe pushed them
past me.</p>
            <p id="p13">Pretty sweet.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Call for Vote - DocBook V5.0</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/10/19/docbook50"/>
      <id>http://norman.walsh.name/2009/10/19/docbook50</id>
      <published>2009-10-19T19:02:35Z</published>
      <updated>2009-10-19T19:15:32Z</updated>
      <category term="docbook" scheme="http://technorati.com/tag/"/>
      <dc:subject>DocBook</dc:subject>
      <category term="oasis" scheme="http://technorati.com/tag/"/>
      <dc:subject>OASIS</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>DocBook V5.0 is ready to become an OASIS Standard!</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/10/19/docbook50">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>DocBook V5.0 is ready to become an OASIS Standard!</p>
            </div>
            <p id="p1">It's been a long, and sometimes winding, path from DocBook V4.5 to DocBook V5.0, but we're finally approaching the finish line to make it official. The OASIS <a href="http://lists.oasis-open.org/archives/docbook-tc/200910/msg00009.html"
                  shape="rect">Call for Votes</a> went out on Friday!</p>
            <p id="p2">The <a href="http://www.oasis-open.org/apps/org/workgroup/voting/ballot.php?id=1785"
                  shape="rect">ballot</a> is open now through 31 October, 2009.</p>
            <p id="p3">If you belong to an <a href="http://www.oasis-open.org/about/foundational_sponsors.php" shape="rect">OASIS Member</a> organization, please encourage your representative to vote “Yes”. (The full list of eligible voting members is included on the ballot).</p>
            <p id="p4">DocBook V5.0 is a significant milestone in the evolution of the standard. Based on RELAX NG and designed with extensibility and flexibility in mind, making DocBook V5.0 an OASIS Standard will provide the solid foundation that we need to continue improving <em>The Source for Documentation</em>™.</p>
            <p id="p5">(We really do have some cool stuff coming down the pipe, so as they say in Chicago, vote early and vote often!)</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Micro-blogging Backup, part the fifth</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/10/18/mbb05"/>
      <id>http://norman.walsh.name/2009/10/18/mbb05</id>
      <published>2009-10-18T20:14:49Z</published>
      <updated>2009-10-19T15:47:02Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <category term="microblogging" scheme="http://technorati.com/tag/"/>
      <dc:subject>Microblogging</dc:subject>
      <category term="www" scheme="http://technorati.com/tag/"/>
      <dc:subject>TheWeb</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>In which we clean things up.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/10/18/mbb05">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>In which we clean things up.</p>
            </div>
            <p id="p1">If you've been using one of the micro-blogging services for a while, you're probably familiar with a set of conventions that have evolved for adding metadata to your status messages. The ones I'm familiar with are:</p>
            <div class="itemizedlist">
               <ul>
                  <li>
                     <p id="p2">
                        <em>@user</em> to identify another user.</p>
                  </li>
                  <li>
                     <p id="p3">
                        <em>#tag</em> to add a “tag” to your message.</p>
                  </li>
                  <li>
                     <p id="p4">
                        <em>!group</em> to identify a group (at the moment, this seems only to be an <a href="http://identi.ca/" shape="rect">Identi.ca</a> convention).</p>
                  </li>
               </ul>
            </div>
            <p id="p5">In addition to those conventions, the use of “URL shorteners” (<a href="http://tinyurl.com/" shape="rect">http://tinyurl.com/</a>, <a href="http://bit.ly/" shape="rect">http://bit.ly/</a>, <a href="http://is.gd/" shape="rect">http://is.gd/</a>, etc.) is common. And, finally, although it may not be apparent in the client you use, at the API level, individual status messages may indicate that they are “in-reply-to” some other message.</p>
            <p id="p6">So far, our micro-blogging backup system doesn't take advantage of any of this extra information.</p>
            <p id="p7">One of the first things I decided to do was expand shortened URIs. There's no 140 character limit in the database and if you link to something on <a href="http://youtube.com/" shape="rect">http://youtube.com/</a>, the odds that I want to follow that link are within <a href="http://en.wikipedia.org/wiki/Limit_%2528mathematics%2529"
                  title="Wikipedia: Limit (mathematics)"
                  shape="rect">ε</a> of zero. I'd like to know before I click.</p>
            <p id="p8">As long as we're grovelling through the text of each message, it makes sense to expand the other conventions, turning them into the appropriate links.</p>
            <p id="p9">It also makes sense to download any messages that an existing message is “in-reply-to”. If those messages are also replies, we'll follow them too until the trail ends. This allows us to display whole conversations, even if they involve participants that we don't follow.</p>
            <p id="p10">All of this can be accomplished with one new module, <a href="examples/cleanup.xqy" shape="rect">/modules/cleanup.xqy</a>, and a new top-level query to drive it, <a href="examples/clean-tweets.xqy" shape="rect">/clean-tweets.xqy</a>. The interesting bits are in the <tt class="filename">cleanup.xqy</tt> module:</p>
            <div class="orderedlist">
               <ol style="list-style: decimal;">
                  <li>
                     <p id="p11">The actual work is just string manipulation: regular expressions and tokenize, mostly.</p>
                  </li>
                  <li>
                     <p id="p12">Following replies counts against your rate-limit, so we do at most 50 at a time.</p>
                  </li>
                  <li>
                     <p id="p13">To expand URIs, we perform HTTP HEAD requests against the URIs we find in the status messages. In the worst case, some of those may timeout, so we do at most 500 at a time. That way we're unlikely to perform a query that takes so long that <em>it</em> times out.</p>
                  </li>
                  <li>
                     <p id="p14">If you look closely, you'll see that in addition to doing the expansions, we also add new elements to the status document: <tt class="tag-starttag">&lt;t:mention&gt;</tt> for mentions of another user, <tt class="tag-starttag">&lt;t:tag&gt;</tt> for tags, <tt class="tag-starttag">&lt;t:group&gt;</tt> for groups, and <tt class="tag-starttag">&lt;t:host&gt;</tt> for the host names of expanded URIs.</p>
                     <p id="p15">We'll come back in some future installment and use those for faceted searches (e.g., “find all the messages by <tt class="literal">@xmlcalabash</tt> that include links to <tt class="literal">tests.xproc.org</tt>”).</p>
                  </li>
               </ol>
            </div>
            <p id="p16">Pop the two files mentioned above into your setup (if this is your first encounter with my micro-blogging backup series, make sure you start at <a href="http://norman.walsh.name/2009/08/27/mbb01"
                  title="Micro-blogging Backup, part the first"
                  shape="rect">the beginning</a>).</p>
            <p id="p17">After you've installed the files, running <a href="http://localhost:8330/clean-tweets.xqy" shape="rect">http://localhost:8330/clean-tweets.xqy</a> will start cleaning up your database. If you've downloaded a lot of messages, you'll have to run it several times. If you have a lot of replies, you'll have to spread the runs over a few hours.</p>
            <p id="p18">The fact that you sometimes have to run the cleanup scripts several times is a bit inconvenient. I'm experimenting with some JavaScript to improve that, but I'm still looking for better solutions.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Built my own...</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/10/15/builtMyOwn"/>
      <id>http://norman.walsh.name/2009/10/15/builtMyOwn</id>
      <published>2009-10-16T01:31:15Z</published>
      <updated>2009-10-16T02:18:12Z</updated>
      <category term="linux" scheme="http://technorati.com/tag/"/>
      <dc:subject>Linux</dc:subject>
      <dc:subject>SelfReference</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Another geekdom right of passage: builing my own box.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/10/15/builtMyOwn">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Another geekdom right of passage: builing my own box.</p>
            </div>
            <p id="p1">As I noted <a href="http://norman.walsh.name/2009/06/03/buildingMyOwn"
                  title="Building my own…"
                  shape="rect">back in June</a>, I've always wanted to build my own box. Now I have.</p>
            <p id="p2">I don't, in retrospect, claim to have learned what “the right answer” is, but here's the answer I arrived at:</p>
            <div class="itemizedlist">
               <ul>
                  <li>
                     <p id="p3">AMD Phenom II X4 955 Black Edition CPU</p>
                     <div class="artwork">
                        <div class="flickr-photo">
                           <div class="photo" style="width: 500px">
                              <a href="http://www.flickr.com/photos/ndw/4014549981/" shape="rect">
                                 <img border="0" alt="[Photo]"
                                      src="http://farm3.static.flickr.com/2425/4014549981_1949998479.jpg"/>
                              </a>
                           </div>
                           <div class="link" style="left: 225px;">
                              <a href="http://www.flickr.com/" shape="rect">
                                 <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                              </a>
                           </div>
                           <h3>AMD Phenom II X4 Processor Black Edition</h3>
                        </div>
                     </div>
                     <p id="p4">I was more than a little amused at the discrepancy between the size of the CPU and the size of the aluminum and copper monstrosity that sits on top of it to keep it cool:</p>
                     <div class="artwork">
                        <div class="flickr-photo">
                           <div class="photo" style="width: 500px">
                              <a href="http://www.flickr.com/photos/ndw/4014551791/" shape="rect">
                                 <img border="0" alt="[Photo]"
                                      src="http://farm4.static.flickr.com/3492/4014551791_6f26f86fc8.jpg"/>
                              </a>
                           </div>
                           <div class="link" style="left: 225px;">
                              <a href="http://www.flickr.com/" shape="rect">
                                 <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                              </a>
                           </div>
                           <h3>Cooling apparatus for the CPU</h3>
                        </div>
                     </div>
                  </li>
                  <li>
                     <p id="p5">ASUS M4A78T-E AM3 790GX HDMI ATX AMD motherboard</p>
                     <div class="artwork">
                        <div class="flickr-photo">
                           <div class="photo" style="width: 500px">
                              <a href="http://www.flickr.com/photos/ndw/4014572189/" shape="rect">
                                 <img border="0" alt="[Photo]"
                                      src="http://farm4.static.flickr.com/3512/4014572189_0ecfc8aecf.jpg"/>
                              </a>
                           </div>
                           <div class="link" style="left: 225px;">
                              <a href="http://www.flickr.com/" shape="rect">
                                 <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                              </a>
                           </div>
                           <h3>Motherboard in place</h3>
                        </div>
                     </div>
                  </li>
                  <li>
                     <p id="p6">4 x G.Skill 2GB 240-pin DDR3 1600 memory</p>
                     <div class="artwork">
                        <div class="flickr-photo">
                           <div class="photo" style="width: 500px">
                              <a href="http://www.flickr.com/photos/ndw/4015324442/" shape="rect">
                                 <img border="0" alt="[Photo]"
                                      src="http://farm3.static.flickr.com/2528/4015324442_a11180f982.jpg"/>
                              </a>
                           </div>
                           <div class="link" style="left: 225px;">
                              <a href="http://www.flickr.com/" shape="rect">
                                 <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                              </a>
                           </div>
                           <h3>Four of 8Gb</h3>
                        </div>
                     </div>
                  </li>
                  <li>
                     <p id="p7">4 x WD Caviar Green 1TB SATA hard drives</p>
                     <div class="artwork">
                        <div class="flickr-photo">
                           <div class="photo" style="width: 375px">
                              <a href="http://www.flickr.com/photos/ndw/4014565113/" shape="rect">
                                 <img border="0" alt="[Photo]"
                                      src="http://farm4.static.flickr.com/3479/4014565113_856bee4655.jpg"/>
                              </a>
                           </div>
                           <div class="link" style="left: 163px;">
                              <a href="http://www.flickr.com/" shape="rect">
                                 <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                              </a>
                           </div>
                           <h3>Gobs of space</h3>
                        </div>
                     </div>
                  </li>
                  <li>
                     <p id="p8">LG Black 8x BD-ROM 16x DVD-ROM 40x CD-ROM SATA LightScribe burner</p>
                     <div class="artwork">
                        <div class="flickr-photo">
                           <div class="photo" style="width: 500px">
                              <a href="http://www.flickr.com/photos/ndw/4014568137/" shape="rect">
                                 <img border="0" alt="[Photo]"
                                      src="http://farm3.static.flickr.com/2478/4014568137_a5bc32ac25.jpg"/>
                              </a>
                           </div>
                           <div class="link" style="left: 225px;">
                              <a href="http://www.flickr.com/" shape="rect">
                                 <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                              </a>
                           </div>
                           <h3>Optical drive</h3>
                        </div>
                     </div>
                  </li>
                  <li>
                     <p id="p9">APEVIA ATX-AS680W-BL 680W ATX12V/EPS12V SLI power supply</p>
                  </li>
                  <li>
                     <p id="p10">XCLIO Windtunnel ATX Full Tower case</p>
                     <div class="artwork">
                        <div class="flickr-photo">
                           <div class="photo" style="width: 500px">
                              <a href="http://www.flickr.com/photos/ndw/4015325928/" shape="rect">
                                 <img border="0" alt="[Photo]"
                                      src="http://farm3.static.flickr.com/2477/4015325928_3b1656f55d.jpg"/>
                              </a>
                           </div>
                           <div class="link" style="left: 225px;">
                              <a href="http://www.flickr.com/" shape="rect">
                                 <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                              </a>
                           </div>
                           <h3>Case with power supply in place</h3>
                        </div>
                     </div>
                  </li>
               </ul>
            </div>
            <p id="p11">I was pleasantly surprised how easy it was to assemble. The documentation was clear and complete, the parts all well labelled; it went together without a hitch.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4015338008/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2483/4015338008_691a78251e.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>All systems go!</h3>
               </div>
            </div>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/4014575611/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2475/4014575611_c98205a670.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Glow in the dark</h3>
               </div>
            </div>
            <p id="p12">(For a few more pictures, see <a href="http://www.flickr.com/photos/ndw/sets/72157622593140610/" shape="rect">Custom Build 2009</a> on <a href="http://www.flickr.com/" shape="rect">Flickr</a>.)</p>
            <p id="p13">I decided to run bleeding-edge <a href="http://en.wikipedia.org/wiki/Ubuntu_%28operating_system%29"
                  title="Wikipedia: Ubuntu (operating system)"
                  shape="rect">Ubuntu</a> 9.10 Server beta on it. It's not that I wouldn't like to try <a href="http://en.wikipedia.org/wiki/Solaris_%28operating_system%29"
                  title="Wikipedia: Solaris (operating system)"
                  shape="rect">Solaris</a> and <a href="http://en.wikipedia.org/wiki/ZFS" title="Wikipedia: ZFS" shape="rect">ZFS</a>, but…the days are short and the list of projects is long. Installing Ubuntu
is just that little bit easier. Ubuntu 9.10 installed flawlessly.</p>
            <p id="p14">The ASUS motherboard includes a video controller and assorted other controllers. It claims to do <a href="http://en.wikipedia.org/wiki/RAID" title="Wikipedia: RAID" shape="rect">RAID</a>, but a little investigation reveals that it's <a href="http://en.wikipedia.org/wiki/RAID%23Firmware.2Fdriver-based_RAID"
                  title="Wikipedia: RAID#Firmware.2Fdriver-based RAID"
                  shape="rect">fakeraid</a> so I abandoned it. Instead, I setup software RAID. The first disk is the boot disk, the remaining three provide 2TB of disk in a RAID5
configuration.</p>
            <p id="p15">The case, for all those monstrous fans, is pretty quiet, but not silent. I still think the server might get relocated to the basement, except that I worry about the higher humidity.</p>
            <p id="p16">Anyway, <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> is up-and-running and as soon as I get my <a href="http://norman.walsh.name/2009/08/27/mbb01"
                  title="Micro-blogging Backup, part the first"
                  shape="rect">Micro-blogging backup</a> configuration ported over, I promise I'll write that next installment…</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>XML Calabash 0.9.15</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/10/05/xmlcalabash"/>
      <id>http://norman.walsh.name/2009/10/05/xmlcalabash</id>
      <published>2009-10-06T01:04:11Z</published>
      <updated>2009-10-06T13:19:03Z</updated>
      <category term="calabash" scheme="http://technorati.com/tag/"/>
      <dc:subject>Calabash</dc:subject>
      <category term="java" scheme="http://technorati.com/tag/"/>
      <dc:subject>Java</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>A new release at last. New features, fewer bugs, and test suite clean again.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/10/05/xmlcalabash">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>A new release at last. New features, fewer bugs, and test suite clean again.</p>
            </div>
            <p id="p1">Speaking <a href="dominicanrepublic#p1" shape="rect">of work</a> (letting the non sequitors pile up), when I wasn't on the beach, I was hacking <a href="http://xmlcalabash.com/" shape="rect">XML Calabash</a>. I always update the <a href="http://norman.walsh.name/2008/projects/calabash"
                  title="XML Calabash: an XProc implementation"
                  shape="rect">project status page</a>, but in case that's too subtle, perhaps this essay will catch your attention. (You're reading this, so I guess that should be “caught” without the “perhaps”, but
nevermind.)</p>
            <p id="p2">Next in the task queue: supporting Saxon 9.2. I fear this is going to be a bit of a b*tch as I reach down into the guts of Saxon in a few places. Fair warning: I don't plan to attempt to support previous versions of Saxon after I make this switch.</p>
            <div class="section">
               <h2 class="runin">XML Calabash Logging </h2>
               <p class="runin" id="p3">
                  <a id="logging" name="logging" shape="rect"/>One of the changes in 0.9.15 is the switch from my own crufty, home grown logging infrastructure to the Java core logging facilities. On the plus side: this gives you a lot more control over the logging. On the minus: you have to deal with the core logging facilities.</p>
               <p id="p4">Near as I can tell, this has to be done with a properties file. Here's what I do: I set the system property <span class="property">java.util.logging.config.file</span> to point to my own logging configuration file, <tt class="filename">/Users/ndw/java/logging.properties</tt>.</p>
               <p id="p5">My logging properties file, constructed mostly through trial and error, is shown below. The important bits are:</p>
               <div class="calloutlist">
                  <table border="0" summary="Callout list">
                     <tr class="callout-row">
                        <td class="callout-bug" valign="baseline" align="left" rowspan="1" colspan="1">
                           <p>
                              <a href="#conslevel" shape="rect">
                                 <img alt="1" border="0" src="/graphics/callouts/1.png"/>
                              </a>  </p>
                        </td>
                        <td class="callout-body" valign="baseline" align="left" rowspan="1" colspan="1">
                           <p id="p6">There are two places to control the amount of detail in the logs. By default, the console handler won't print any messages more detailed than “<tt class="literal">INFO</tt>”. If you want to get more detail, you have to turn this knob appropriately.</p>
                        </td>
                     </tr>
                     <tr class="callout-row">
                        <td class="callout-bug" valign="baseline" align="left" rowspan="1" colspan="1">
                           <p>
                              <a href="#consfmt" shape="rect">
                                 <img alt="2" border="0" src="/graphics/callouts/2.png"/>
                              </a>  </p>
                        </td>
                        <td class="callout-body" valign="baseline" align="left" rowspan="1" colspan="1">
                           <p id="p7">I really dislike the default message format, so I wrote a formatter that produces slightly more compact output. You might like it too.</p>
                        </td>
                     </tr>
                     <tr class="callout-row">
                        <td class="callout-bug" valign="baseline" align="left" rowspan="1" colspan="1">
                           <p>
                              <a href="#calabashlevel" shape="rect">
                                 <img alt="3" border="0" src="/graphics/callouts/3.png"/>
                              </a>  </p>
                        </td>
                        <td class="callout-body" valign="baseline" align="left" rowspan="1" colspan="1">
                           <p id="p8">This is the other place where you can control the amount of detail. “<tt class="literal">ALL</tt>” gives you all the messages. You might prefer “<tt class="literal">SEVERE</tt>”, which gives you only the fatal errors.</p>
                           <p id="p9">You can be selective here. If, for some reason, you want to see the gory detail of the XInclude step, but ignore everything else, you could set:</p>
                           <div class="programlisting">
                              <pre xml:space="preserve">
com.xmlcalabash.level=SEVERE
com.xmlcalabash.XInclude=FINEST
</pre>
                           </div>
                        </td>
                     </tr>
                  </table>
               </div>
               <p id="p10">I'll try to provide better documentation for the available names.</p>
               <p id="p11">Here's my current logging properties file:</p>
               <div class="programlisting">
                  <pre xml:space="preserve">
############################################################
# Logging Configuration File
#
# You can use a different file by specifying a filename
# with the java.util.logging.config.file system property.  
# For example java -Djava.util.logging.config.file=myfile
############################################################

############################################################
#       Global properties
############################################################

# "handlers" specifies a comma separated list of log Handler 
# classes.  These handlers will be installed during VM startup.
# Note that these classes must be on the system classpath.
# By default we only configure a ConsoleHandler, which will only
# show messages at the INFO and above levels.
handlers=java.util.logging.ConsoleHandler

# To also add the FileHandler, use the following line instead.
#handlers= java.util.logging.FileHandler, java.util.logging.ConsoleHandler

# Default global logging level.
# This specifies which kinds of events are logged across
# all loggers.  For any given facility this global level
# can be overriden by a facility specific level
# Note that the ConsoleHandler also has a separate level
# setting to limit messages printed to the console.
.level = ALL

############################################################
# Handler specific properties.
# Describes specific configuration info for Handlers.
############################################################

# default file output is in user's home directory.
java.util.logging.FileHandler.pattern = %h/java%u.log
java.util.logging.FileHandler.limit = 50000
java.util.logging.FileHandler.count = 1
java.util.logging.FileHandler.formatter = java.util.logging.XMLFormatter

# Limit the message that are printed on the console to INFO and above.
java.util.logging.ConsoleHandler.level=FINE<a name="conslevel" id="conslevel" shape="rect"/><img alt="1" border="0" src="/graphics/callouts/1.png"/>
java.util.logging.ConsoleHandler.formatter=com.xmlcalabash.util.LogFormatter<a name="consfmt" id="consfmt" shape="rect"/><img alt="2" border="0" src="/graphics/callouts/2.png"/>

############################################################
# Facility specific properties.
# Provides extra control for each logger.
############################################################

com.xmlcalabash.level=ALL<a name="calabashlevel" id="calabashlevel" shape="rect"/><img alt="3" border="0" src="/graphics/callouts/3.png"/>
</pre>
               </div>
               <p id="p12">I'm fully open to suggestions for better approaches.</p>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>XML Summer School ’09</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/10/05/xmlss09"/>
      <id>http://norman.walsh.name/2009/10/05/xmlss09</id>
      <published>2009-10-05T23:36:02Z</published>
      <updated>2009-10-06T13:19:03Z</updated>
      <category term="photographs" scheme="http://technorati.com/tag/"/>
      <dc:subject>Photography</dc:subject>
      <category term="travel" scheme="http://technorati.com/tag/"/>
      <dc:subject>Travel</dc:subject>
      <category term="xml" scheme="http://technorati.com/tag/"/>
      <dc:subject>XML</dc:subject>
      <category term="xmlss09" scheme="http://technorati.com/tag/"/>
      <dc:subject>XMLSummerSchool2009</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Open source and web technologies at XML Summer School.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/10/05/xmlss09">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Open source and web technologies at XML Summer School.</p>
            </div>
            <p id="p1">Speaking of travelling, a bit of a non sequitor if you aren't reading these essays sequentially, I had just returned from <a href="http://xmlsummerschool.com/" shape="rect">XML Summer School</a> the day before my <a href="http://norman.walsh.name/2009/10/05/dominicanrepublic"
                  title="Dominican Republic"
                  shape="rect">Caribbean jaunt</a>.</p>
            <p id="p2">I was delighted to see the return of XML Summer School (September is summer somewhere, I'm sure) and even more delighted to be invited to speak.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3987058416/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2506/3987058416_3dcf5519ee.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 210px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a> 
                     <a href="http://maps.google.com/maps?ll=51.7535027777778,-1.25336666666667&amp;z=16&amp;t=k"
                        shape="rect">
                        <img border="0" alt="[Google maps]" src="/graphics/map.png"/>
                     </a>
                  </div>
                  <h3>Radcliffe Camera</h3>
               </div>
            </div>
            <p id="p3">I taught in the “Open Source” and “Web 2.0” tracks, presenting <a href="examples/opensource.pdf" shape="rect">Open Source Application Development</a> and <a href="examples/web20.pdf" shape="rect">Building Dynamic Web Applications with XML</a>, respectively. I think the sessions went well. (After more than ten years of doing it, I have finally (and suddenly, this year) reached the point where it doesn't make me gut knotting nervous to give presentations. That has to have improved my stage presence.)</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 334px">
                     <a href="http://www.flickr.com/photos/ndw/3986299811/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2513/3986299811_f42defb144.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 127px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a> 
                     <a href="http://maps.google.com/maps?ll=51.7527305555556,-1.24996111111111&amp;z=16&amp;t=k"
                        shape="rect">
                        <img border="0" alt="[Google maps]" src="/graphics/map.png"/>
                     </a>
                  </div>
                  <h3>Oxford sunset</h3>
               </div>
            </div>
            <p id="p4">Beyond the specific courses, what sets Summer School apart is the staggering array of talent lined up to teach. If you've got a question or a problem about something even tangentially related to markup, there's someone at XML Summer School who's thought about it, built it, or deployed it. Probably all three. That we're all good friends, engaging students not just in the classroom but also punting and pub crawling, is just gravy for everyone.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3987049542/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2659/3987049542_5323b9ca45.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 210px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a> 
                     <a href="http://maps.google.com/maps?ll=51.7482111111111,-1.24675555555556&amp;z=16&amp;t=k"
                        shape="rect">
                        <img border="0" alt="[Google maps]" src="/graphics/map.png"/>
                     </a>
                  </div>
                  <h3>Punters</h3>
               </div>
            </div>
            <p id="p5">A bargain at twice the price, I promise.</p>
            <p id="p6">After Summer School, I snagged a weekend with my folks.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 334px">
                     <a href="http://www.flickr.com/photos/ndw/3987064130/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2631/3987064130_3dc00cb314.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 142px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Got seeds?</h3>
               </div>
            </div>
            <p id="p7">A relaxing end to a great week not without hard work and long days.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Dominican Republic</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/10/05/dominicanrepublic"/>
      <id>http://norman.walsh.name/2009/10/05/dominicanrepublic</id>
      <published>2009-10-05T22:51:30Z</published>
      <updated>2009-10-06T13:19:03Z</updated>
      <category term="dominicanrepublic" scheme="http://technorati.com/tag/"/>
      <dc:subject>DominicanRepublic</dc:subject>
      <category term="photographs" scheme="http://technorati.com/tag/"/>
      <dc:subject>Photography</dc:subject>
      <category term="travel" scheme="http://technorati.com/tag/"/>
      <dc:subject>Travel</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>A long weekend in the Dominican Republic brings me to country number 15.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/10/05/dominicanrepublic">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>A long weekend in the Dominican Republic brings me to country number 15.</p>
            </div>
            <p id="p1">The Dominican Republic is <a href="http://norman.walsh.name/2008/09/14/antigua" title="Antigua and Barbuda"
                  shape="rect">country number fifteen</a> for me. Deb had to work, and I did <a href="http://norman.walsh.name/2008/projects/calabash"
                  title="XML Calabash: an XProc implementation"
                  shape="rect">a little work</a> too, but mostly I was free to enjoy the sun and the sea and the sand.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3986321471/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2575/3986321471_1c16a5df51.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 210px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a> 
                     <a href="http://maps.google.com/maps?ll=18.7383638888889,-68.4796722222222&amp;z=16&amp;t=k"
                        shape="rect">
                        <img border="0" alt="[Google maps]" src="/graphics/map.png"/>
                     </a>
                  </div>
                  <h3>Punta Cana Beach</h3>
               </div>
            </div>
            <p id="p2">We don't live in the Caribbean, remind me why.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3986319793/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm4.static.flickr.com/3496/3986319793_06d61d6530.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 210px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a> 
                     <a href="http://maps.google.com/maps?ll=18.7365611111111,-68.4788416666667&amp;z=16&amp;t=k"
                        shape="rect">
                        <img border="0" alt="[Google maps]" src="/graphics/map.png"/>
                     </a>
                  </div>
                  <h3>Moon Palace Pool</h3>
               </div>
            </div>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 334px">
                     <a href="http://www.flickr.com/photos/ndw/3987077580/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2533/3987077580_b1cac4c6e4.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 127px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a> 
                     <a href="http://maps.google.com/maps?ll=18.7393833333333,-68.4817111111111&amp;z=16&amp;t=k"
                        shape="rect">
                        <img border="0" alt="[Google maps]" src="/graphics/map.png"/>
                     </a>
                  </div>
                  <h3>Footprints</h3>
               </div>
            </div>
            <p id="p3">The Moon Palace resort was lovely (in a big “cruise ship on land” sort of way), and this is not a criticism, just an observation, but clearly there are no <a href="http://en.wikipedia.org/wiki/Occupational_Safety_and_Health_Administration"
                  title="Wikipedia: Occupational Safety and Health Administration"
                  shape="rect">OSHA</a> inspectors in the Dominican Republic.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3987081082/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2537/3987081082_686872a273.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>OSHA? What's that?</h3>
               </div>
            </div>
            <p id="p4">Or maybe there are no personal injury lawyers, I don't know. The tripping hazard in our room was not the only structural feature that made me do a double take.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>SQL to XML</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/09/26/sqltoxml"/>
      <id>http://norman.walsh.name/2009/09/26/sqltoxml</id>
      <published>2009-09-26T12:01:23Z</published>
      <updated>2009-09-26T16:05:00Z</updated>
      <category term="osx" scheme="http://technorati.com/tag/"/>
      <dc:subject>OSX</dc:subject>
      <category term="xml" scheme="http://technorati.com/tag/"/>
      <dc:subject>XML</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>A number of Mac applications store information in SQLite databases. Step one to do something useful with that data is to get it into XML.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/09/26/sqltoxml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>A number of Mac applications store information in SQLite databases. Step one to do something useful with that data is to get it into XML.</p>
            </div>
            <p id="p1">I hate having my data squirreled away in proprietary or quasi-proprietary ways. If I can't get my information back out of an app, I'd rather not use it. When I started using the Mac, I switched to <span class="application">iCal</span> and <span class="application">AddressBook</span>: both can export data in standard text formats which I can easily convert to XML.</p>
            <p id="p2">But exporting the data is a manual process (though I could probably automate it with some clever AppleScript or something, I've never tried). I build a number of views of my address book and calendar data automatically so manual processes don't fit well into my workflow.</p>
            <p id="p3">It didn't take too long to figure out where <span class="application">iCal</span> stores my appointments or how to pull them together. Having worked out where <span class="application">iCal</span> stores my appointments, I turned my attention to <span class="application">AddressBook</span>.</p>
            <p id="p4">Long story short: the address data is saved in a database in <tt class="filename">~/Library/Application Support/AddressBook/AddressBook-v22.abcddb</tt>. After installing the <span class="application">sqlite3</span> application from MacPorts, I was able to extract a text dump. So far so good.</p>
            <p id="p5">Here, for example, is a table definition and the first row of data in that table:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
CREATE TABLE ZABCDRECORD ( Z_PK INTEGER PRIMARY KEY, Z_ENT INTEGER,
Z_OPT INTEGER, ZDISPLAYFLAGS INTEGER, ZMODIFICATIONDATEYEAR INTEGER,
ZCREATIONDATEYEAR INTEGER, ZADDRESSBOOKSOURCE INTEGER, ZISALL INTEGER,
ZME INTEGER, Z19_ME INTEGER, ZINFO INTEGER, ZBIRTHDAYYEAR INTEGER,
ZPRIVACYFLAGS INTEGER, ZNOTE INTEGER, ZADDRESSBOOKSOURCE1 INTEGER,
ZCONTACTINDEX INTEGER, ZSOURCEWHERECONTACTISME INTEGER, ZVERSION
INTEGER, ZSYNCCOUNT INTEGER, ZSHARECOUNT INTEGER, ZADDRESSBOOKSOURCE2
INTEGER, ZMODIFICATIONDATE TIMESTAMP, ZCREATIONDATE TIMESTAMP,
ZMODIFICATIONDATEYEARLESS FLOAT, ZCREATIONDATEYEARLESS FLOAT,
ZBIRTHDAY TIMESTAMP, ZBIRTHDAYYEARLESS FLOAT, ZUNIQUEID VARCHAR, ZNAME
VARCHAR, ZNAMENORMALIZED VARCHAR, ZTMPREMOTELOCATION VARCHAR, ZNAME1
VARCHAR, ZREMOTELOCATION VARCHAR, ZSERIALNUMBER VARCHAR, ZSUFFIX
VARCHAR, ZTITLE VARCHAR, ZTMPHOMEPAGE VARCHAR, ZNICKNAME VARCHAR,
ZORGANIZATION VARCHAR, ZMAIDENNAME VARCHAR, ZIDENTITYUNIQUEID VARCHAR,
ZPHONETICFIRSTNAME VARCHAR, ZDEPARTMENT VARCHAR, ZPHONETICLASTNAME
VARCHAR, ZMIDDLENAME VARCHAR, ZFIRSTNAME VARCHAR, ZIMAGEREFERENCE
VARCHAR, ZJOBTITLE VARCHAR, ZPHONETICMIDDLENAME VARCHAR, ZLASTNAME
VARCHAR, ZSORTINGFIRSTNAME VARCHAR, ZSORTINGLASTNAME VARCHAR,
ZCREATEDVERSION VARCHAR, ZLASTDOTMACACCOUNT VARCHAR, ZLASTSAVEDVERSION
VARCHAR, ZSYNCANCHOR VARCHAR, ZSEARCHELEMENTDATA BLOB,
ZMODIFIEDUNIQUEIDSDATA BLOB );
INSERT INTO "ZABCDRECORD" VALUES(1,18,287,NULL,NULL,2008,NULL,1,NULL,NULL,
3,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
239119683.670433,NULL,18281283.670433,NULL,NULL,
'93973926-7EF6-40F0-ADBD-8C7BBFC30FA1:ABSubscriptionRecord',NULL,
NULL,NULL,NULL,'local','B7303AAD-DA79-46E6-BC7D-91DAD82AEFB8',
NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,
NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);
</pre>
            </div>
            <p id="p6">Next, I wrote <a href="examples/sqltoxml" shape="rect">150 or so lines</a> of <a href="http://en.wikipedia.org/wiki/Perl" title="Wikipedia: Perl" shape="rect">Perl</a> to convert the text into XML.</p>
            <p id="p7">The XML is designed to be a totally straightforward representation of the table structure of the database. From the preceding SQL statements, <span class="application">sqltoxml</span> produces:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;table name='ZABCDRECORD'&gt;
&lt;columns&gt;
  &lt;column name='Z_PK' type='INTEGER'/&gt;
  &lt;column name='Z_ENT' type='INTEGER'/&gt;
  &lt;column name='Z_OPT' type='INTEGER'/&gt;
  &lt;column name='ZDISPLAYFLAGS' type='INTEGER'/&gt;
  &lt;column name='ZMODIFICATIONDATEYEAR' type='INTEGER'/&gt;
  &lt;column name='ZCREATIONDATEYEAR' type='INTEGER'/&gt;
  &lt;column name='ZADDRESSBOOKSOURCE' type='INTEGER'/&gt;
  &lt;column name='ZISALL' type='INTEGER'/&gt;
  &lt;column name='ZME' type='INTEGER'/&gt;
  &lt;column name='Z19_ME' type='INTEGER'/&gt;
  &lt;column name='ZINFO' type='INTEGER'/&gt;
  &lt;column name='ZBIRTHDAYYEAR' type='INTEGER'/&gt;
  &lt;column name='ZPRIVACYFLAGS' type='INTEGER'/&gt;
  &lt;column name='ZNOTE' type='INTEGER'/&gt;
  &lt;column name='ZADDRESSBOOKSOURCE1' type='INTEGER'/&gt;
  &lt;column name='ZCONTACTINDEX' type='INTEGER'/&gt;
  &lt;column name='ZSOURCEWHERECONTACTISME' type='INTEGER'/&gt;
  &lt;column name='ZVERSION' type='INTEGER'/&gt;
  &lt;column name='ZSYNCCOUNT' type='INTEGER'/&gt;
  &lt;column name='ZSHARECOUNT' type='INTEGER'/&gt;
  &lt;column name='ZADDRESSBOOKSOURCE2' type='INTEGER'/&gt;
  &lt;column name='ZMODIFICATIONDATE' type='TIMESTAMP'/&gt;
  &lt;column name='ZCREATIONDATE' type='TIMESTAMP'/&gt;
  &lt;column name='ZMODIFICATIONDATEYEARLESS' type='FLOAT'/&gt;
  &lt;column name='ZCREATIONDATEYEARLESS' type='FLOAT'/&gt;
  &lt;column name='ZBIRTHDAY' type='TIMESTAMP'/&gt;
  &lt;column name='ZBIRTHDAYYEARLESS' type='FLOAT'/&gt;
  &lt;column name='ZUNIQUEID' type='VARCHAR'/&gt;
  &lt;column name='ZNAME' type='VARCHAR'/&gt;
  &lt;column name='ZNAMENORMALIZED' type='VARCHAR'/&gt;
  &lt;column name='ZTMPREMOTELOCATION' type='VARCHAR'/&gt;
  &lt;column name='ZNAME1' type='VARCHAR'/&gt;
  &lt;column name='ZREMOTELOCATION' type='VARCHAR'/&gt;
  &lt;column name='ZSERIALNUMBER' type='VARCHAR'/&gt;
  &lt;column name='ZSUFFIX' type='VARCHAR'/&gt;
  &lt;column name='ZTITLE' type='VARCHAR'/&gt;
  &lt;column name='ZTMPHOMEPAGE' type='VARCHAR'/&gt;
  &lt;column name='ZNICKNAME' type='VARCHAR'/&gt;
  &lt;column name='ZORGANIZATION' type='VARCHAR'/&gt;
  &lt;column name='ZMAIDENNAME' type='VARCHAR'/&gt;
  &lt;column name='ZIDENTITYUNIQUEID' type='VARCHAR'/&gt;
  &lt;column name='ZPHONETICFIRSTNAME' type='VARCHAR'/&gt;
  &lt;column name='ZDEPARTMENT' type='VARCHAR'/&gt;
  &lt;column name='ZPHONETICLASTNAME' type='VARCHAR'/&gt;
  &lt;column name='ZMIDDLENAME' type='VARCHAR'/&gt;
  &lt;column name='ZFIRSTNAME' type='VARCHAR'/&gt;
  &lt;column name='ZIMAGEREFERENCE' type='VARCHAR'/&gt;
  &lt;column name='ZJOBTITLE' type='VARCHAR'/&gt;
  &lt;column name='ZPHONETICMIDDLENAME' type='VARCHAR'/&gt;
  &lt;column name='ZLASTNAME' type='VARCHAR'/&gt;
  &lt;column name='ZSORTINGFIRSTNAME' type='VARCHAR'/&gt;
  &lt;column name='ZSORTINGLASTNAME' type='VARCHAR'/&gt;
  &lt;column name='ZCREATEDVERSION' type='VARCHAR'/&gt;
  &lt;column name='ZLASTDOTMACACCOUNT' type='VARCHAR'/&gt;
  &lt;column name='ZLASTSAVEDVERSION' type='VARCHAR'/&gt;
  &lt;column name='ZSYNCANCHOR' type='VARCHAR'/&gt;
  &lt;column name='ZSEARCHELEMENTDATA' type='BLOB'/&gt;
  &lt;column name='ZMODIFIEDUNIQUEIDSDATA' type='BLOB'/&gt;
&lt;/columns&gt;
&lt;rows&gt;
  &lt;row&gt;
    &lt;Z_PK&gt;1&lt;/Z_PK&gt;
    &lt;Z_ENT&gt;18&lt;/Z_ENT&gt;
    &lt;Z_OPT&gt;287&lt;/Z_OPT&gt;
    &lt;ZCREATIONDATEYEAR&gt;2008&lt;/ZCREATIONDATEYEAR&gt;
    &lt;ZISALL&gt;1&lt;/ZISALL&gt;
    &lt;ZINFO&gt;3&lt;/ZINFO&gt;
    &lt;ZCREATIONDATE&gt;239119683.670433&lt;/ZCREATIONDATE&gt;
    &lt;ZCREATIONDATEYEARLESS&gt;18281283.670433&lt;/ZCREATIONDATEYEARLESS&gt;
    &lt;ZUNIQUEID&gt;93973926-7EF6-40F0-ADBD-8C7BBFC30FA1:ABSubscriptionRecord&lt;/ZUNIQUEID&gt;
    &lt;ZREMOTELOCATION&gt;local&lt;/ZREMOTELOCATION&gt;
    &lt;ZSERIALNUMBER&gt;B7303AAD-DA79-46E6-BC7D-91DAD82AEFB8&lt;/ZSERIALNUMBER&gt;
  &lt;/row&gt;
  &lt;!-- ... --&gt;
&lt;/rows&gt;
&lt;/table&gt;
</pre>
            </div>
            <p id="p8">As you can see, I've made no effort to maintain some aspects of the database (like the primary key), I've simply dropped NULL fields, and I'm relying on the field names to be valid XML NCNames. There are clearly other, equally reasonable, design choices that I could have made.</p>
            <p id="p9">The resulting XML is the bare minimum needed to switch to XML tools for subsequent downstream processing (turning address book tables into VCards, for example). But it gets the job done.</p>
            <p id="p10">And maybe it'll come in handy for someone else.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>RDFa for DocBook?</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/09/22/RDFaForDocBook"/>
      <id>http://norman.walsh.name/2009/09/22/RDFaForDocBook</id>
      <published>2009-09-22T13:20:39Z</published>
      <updated>2009-09-26T16:04:15Z</updated>
      <category term="docbook" scheme="http://technorati.com/tag/"/>
      <dc:subject>DocBook</dc:subject>
      <category term="rdf" scheme="http://technorati.com/tag/"/>
      <dc:subject>RDF</dc:subject>
      <category term="xmlss09" scheme="http://technorati.com/tag/"/>
      <dc:subject>XMLSummerSchool2009</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>Adding RDFa to DocBook would make it possible to add a class of semantic annotations to DocBook without changing the schema. But is that a good idea?</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/09/22/RDFaForDocBook">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>Adding RDFa to DocBook would make it possible to add a class of semantic annotations to DocBook without changing the schema. But is that a good idea?</p>
            </div>
            <div class="epigraph">
               <p id="p2">Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information on it.</p>
               <div class="attribution">
                  <span class="mdash">—</span>
                  <span class="personname">
                     <span class="firstname">Samuel</span> 
                     <span class="surname">Johnson</span>
                  </span>
               </div>
            </div>
            <p id="p1">When <span class="personname">
                  <span class="firstname">Bob</span> 
                  <span class="surname">DuCharme</span>
               </span> introduced the semantic web track at <a href="http://www.xmlsummerschool.com/" shape="rect">XML Summer School</a> this morning, he mentioned briefly the idea of adding <a href="http://en.wikipedia.org/wiki/RDFa" title="Wikipedia: RDFa" shape="rect">RDFa</a> to vocabularies other than (X)HTML. In particular, he's investigated how to <a href="http://www.devx.com/semantic/Article/42543/0/page/3" shape="rect">do it in
DocBook</a>.</p>
            <p id="p3">The DocBook TC gets periodic requests to add new inline elements and attributes for bits of metadata. Sometimes the requests are entirely legitimate, in the sense that they're clearly about technical documentation, but seem to apply to such a small audience that the TC is reluctant to add them to all of DocBook.</p>
            <p id="p4">With this in mind, the idea of adding RDFa has some appeal: we add a few new attributes and henceforth users will be able to add new bits of metadata without having to change the DocBook schema.</p>
            <p id="p5">But I'm not sure.</p>
            <p id="p6">First, lots of DocBook elements have more discrete semantics than HTML elements. We don't need to say</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;phrase property="dc:title"&gt;Beautiful Sunset&lt;/phrase&gt;
</pre>
            </div>
            <p id="p7">because we have <tt class="tag-starttag">&lt;citetitle&gt;</tt>. We don't need to say:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;info&gt;
  &lt;bibliomisc&gt;
    &lt;phrase rel="mpc:editor" href="http://mypubco.com/empid/53234"/&gt;
  &lt;/bibliomisc&gt;
&lt;/info&gt;
</pre>
            </div>
            <p id="p8">because we have</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;info&gt;
  &lt;editor role="mpc:editor"&gt;
    &lt;personname&gt;Some Name&lt;/personname&gt;
    &lt;uri&gt;http://mypubco.com/empid/53234&lt;/uri&gt;
  &lt;/editor&gt;
&lt;/info&gt;
</pre>
            </div>
            <p id="p9">I'm not suggesting those are <em>exactly</em> the same, they're clearly not, but I'm comfortable that existing DocBook elements are sufficient for the task.</p>
            <p id="p10">(Yes, you'd need a DocBook-specific tool to extract the metadata, which is a disadvantage, but you probably want one anyway for the existing DocBook semantics.)</p>
            <p id="p11">Second, it would allow you to construct statements with conflicting or, at best, odd semantics:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;section&gt;
  &lt;title property="dc:creator"&gt;Alice1&lt;/title&gt;
  &lt;para xml:id='p12'&gt;This is from section 2.2.&lt;/para&gt;
&lt;/section&gt;
</pre>
            </div>
            <p id="p13">I can just about imagine a sense in which “Alice1” can be both the title of a section and the <a href="http://en.wikipedia.org/wiki/Dublin%20Core"
                  title="Wikipedia: Dublin Core"
                  shape="rect">Dublin Core</a> creator of the section, but it doesn't make a lot of sense.</p>
            <p id="p14">Third, Bob's example seems to suggest that it would encourage markup like this:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;para about="/alice/posts/trouble_with_bob" xml:id='p15'&gt;
  &lt;phrase property="dc:title"&gt;The trouble with Bob2&lt;/phrase&gt;
  &lt;phrase property="dc:creator"&gt;Alice2&lt;/phrase&gt;
&lt;/para&gt;
</pre>
            </div>
            <p id="p16">which seems like a bad idea to me.</p>
            <p id="p17">On the other hand, some of the examples do seem useful for exactly the sort of thing I suggested motivated my interest:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;bibliomisc property="mpc:lastScreenShotDate" content="2009-08-01T15:31:00"/&gt;
&lt;bibliomisc property="mpc:softwareRelease"    content="3.1"/&gt;
</pre>
            </div>
            <p id="p18">In fairness, Bob set out to recreate the triples from the original tutorial, so some of the markup choices were forced upon him.</p>
            <p id="p19">So I'm not sure.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Micro-blogging Backup, part the fourth</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/09/09/mbb04"/>
      <id>http://norman.walsh.name/2009/09/09/mbb04</id>
      <published>2009-09-09T15:11:49Z</published>
      <updated>2009-09-09T16:32:34Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <category term="microblogging" scheme="http://technorati.com/tag/"/>
      <dc:subject>Microblogging</dc:subject>
      <category term="www" scheme="http://technorati.com/tag/"/>
      <dc:subject>TheWeb</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>In which we get to see what our tweets and ’dents look like.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/09/09/mbb04">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>In which we get to see what our tweets and ’dents look like.</p>
            </div>
            <p id="p1">If you haven't been following along, go back and read parts <a href="http://norman.walsh.name/2009/08/27/mbb01"
                  title="Micro-blogging Backup, part the first"
                  shape="rect">one</a>, <a href="http://norman.walsh.name/2009/08/28/mbb02"
                  title="Micro-blogging Backup, part the second"
                  shape="rect">two</a>, and for a little background, <a href="http://norman.walsh.name/2009/09/03/mbb03"
                  title="Micro-blogging Backup, part the third"
                  shape="rect">three</a> first. Now you've got <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> up and running and you've been able to download your tweets and the tweets of those you follow. (Tweets or ’dents depending on which microblogging service you prefer; either, actually both, work for me.)</p>
            <p id="p2">Next, download <a href="examples/mbb04.zip" shape="rect">mbb04.zip</a> and unpack it in the same place where you unpacked <a href="/2009/08/28/examples/mbb02.zip" shape="rect">mbb02.zip</a>. If you were following the instructions, you've edited some of the files in the “<tt class="filename">mbb/inst</tt>” directory and you may have some sessions saved in CQ, so I <em>have not</em> included those directories in <tt class="filename">mbb04.zip</tt>.</p>
            <p id="p3">If you've been tinkering with other files, then you want to unpack this zip with some care or you may overwrite your changes. But you won't overwrite changes made to the installation or CQ areas. (By the same token, this distribution is incomplete without <tt class="filename">mbb02.zip</tt>).</p>
            <p id="p4">With that installed, you now have a CSS file, four modules, and a new “top level” script, <tt class="filename">show-tweets.xqy</tt>. That's the fun one. Point your browser at <a href="http://localhost:8330/show-tweets.xqy" shape="rect">http://localhost:8330/show-tweets.xqy</a> and you should be rewarded with a list of your status messages from today. (As before, adjust the port number as necessary if you installed the application server on a different port.)</p>
            <p id="p5">If you don't have any status messages from today, load <a href="http://localhost:8330/get-tweets.xqy" shape="rect">http://localhost:8330/get-tweets.xqy</a> to download your most recent messages, then try <a href="http://localhost:8330/show-tweets.xqy" shape="rect">http://localhost:8330/show-tweets.xqy</a> again.</p>
            <p id="p6">The <tt class="filename">show-tweets.xqy</tt> script accepts query parameters. You can pass:</p>
            <div class="variablelist">
               <dl>
                  <dt>
                     <tt class="literal">sdate</tt>
                  </dt>
                  <dd>
                     <p id="p7">To specify the starting date in “YYYY-MM-DD” format. If unspecified, defaults to the ending date.</p>
                  </dd>
                  <dt>
                     <tt class="literal">edate</tt>
                  </dt>
                  <dd>
                     <p id="p8">To specify the ending date. If unspecified, defaults to today.</p>
                  </dd>
                  <dt>
                     <tt class="literal">users</tt>
                  </dt>
                  <dd>
                     <p id="p9">To specify one or more users separated by spaces (<tt class="literal">+</tt> signs or <tt class="literal">%20</tt>’s in URI-speak). These should match the <tt class="literal">screen_name</tt> values in your account configuration. The value “<tt class="literal">ALL</tt>” is special, it will list tweets for <em>every</em> user, including all your friends.</p>
                  </dd>
                  <dt>
                     <tt class="literal">service</tt>
                  </dt>
                  <dd>
                     <p id="p10">To specify the service. If you setup accounts on multiple services, this will let you limit the result to only those messages on a single service.</p>
                  </dd>
               </dl>
            </div>
            <p id="p11">Go ahead and give it a try, <a href="http://localhost:8330/show-tweets.xqy?sdate=2009-08-01&amp;edate=2009-08-31&amp;users=ALL"
                  shape="rect">http://localhost:8330/show-tweets.xqy?sdate=2009-08-01&amp;edate=2009-08-31&amp;users=ALL</a> will show you all the messages by you and those you follow posted in the month of August.</p>
            <div class="section">
               <h2 class="runin">Message Formatting </h2>
               <p class="runin" id="p12">
                  <a id="display" name="display" shape="rect"/>The messages are sorted in ascending order by date, so that messages read chronologically “down” the page as you'd expect. However, conversations are handled a little bit specially.</p>
               <p id="p13">Whenever a message is encountered that is part of a conversation (either because it's a reply to another message, or another message exists that is in reply to it), the whole thread is collected together and presented as a unit, like this:</p>
               <div class="artwork">
                  <div class="flickr-photo">
                     <div class="photo" style="width: 500px">
                        <a href="http://www.flickr.com/photos/ndw/3904233170/" shape="rect">
                           <img border="0" alt="[Photo]"
                                src="http://farm3.static.flickr.com/2643/3904233170_f112bd175e.jpg"/>
                        </a>
                     </div>
                     <div class="link" style="left: 225px;">
                        <a href="http://www.flickr.com/" shape="rect">
                           <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                        </a>
                     </div>
                     <h3>Message threading</h3>
                  </div>
               </div>
               <p id="p14">I think that makes the results much easier to follow. We'll come back to what to do about threads involving users other than you or your followers later.</p>
               <div class="admonition">
                  <table border="0" cellspacing="0" cellpadding="4"
                         summary="Presentation of a admonition">
                     <tbody>
                        <tr>
                           <td valign="top" rowspan="1" colspan="1">
                              <span class="admon-graphic">
                                 <img alt="Note" src="/graphics/note.png"/>
                              </span>
                           </td>
                           <td rowspan="1" colspan="1">
                              <div class="admon-text">
                                 <p id="p15">There were some bugs in the Identi.ca server that occasionally caused incorrect “in-reply-to” values to be inserted into the data. I think those <a href="http://identi.ca/notice/9260164" shape="rect">have been fixed</a> in the server, but there's nothing I can do about the values that are wrong.</p>
                              </div>
                           </td>
                        </tr>
                     </tbody>
                  </table>
               </div>
            </div>
            <div class="section">
               <h2 class="runin">URL Rewriting </h2>
               <p class="runin" id="p16">
                  <a id="rewriting" name="rewriting" shape="rect"/>Those <tt class="filename">show-tweets.xqy</tt> URLs may be sufficient, but they're hardly elegant. I'd be much happier if they were better organized for human consumption.</p>
               <p id="p17">Luckily, with the URL rewriting features of MarkLogic Server V4.1, this is easily achieved. To begin with, go back into the admin console (<a href="http://localhost:8001/" shape="rect">http://localhost:8001/</a>) and navigate down through Groups→Default→App Servers in the tree control then select the server you setup for this project.</p>
               <p id="p18">Near the bottom of that page, you'll find a “url rewriter” field. Specify <tt class="literal">modules/url-rewriter.xqy</tt> as the value for that field and then click “Ok” at either the top or bottom of the page.</p>
               <p id="p19">The one I provided supports a range of values designed to be more readable:</p>
               <div class="variablelist">
                  <dl>
                     <dt>
                        <tt class="uri">/my-tweets/<em class="replaceable">
                              <tt class="replaceable">service</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">users</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">end-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p20">Returns the messages associated with “users” on “service” between the specified start and end dates, e.g., <tt class="uri">/my-tweets/identica/ndw/2009-08-01/2009-08-15</tt>.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/my-tweets/<em class="replaceable">
                              <tt class="replaceable">users</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">end-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p21">Returns the messages associated with “users” on any service between the specified start and end dates, e.g., <tt class="uri">/my-tweets/ndw/2009-08-01/2009-08-15</tt>.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/my-tweets/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">end-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p22">Returns the messages associated with any user on any service between the specified start and end dates, e.g., <tt class="uri">/my-tweets/2009-08-01/2009-08-15</tt>.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/my-tweets/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p23">Returns the messages associated with any user on any service between the specified start date and today, e.g., <tt class="uri">/my-tweets/2009-09-01</tt>.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/my-tweets</tt>
                     </dt>
                     <dd>
                        <p id="p24">Returns the messages associated with any user on any service posted today.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/all-tweets/<em class="replaceable">
                              <tt class="replaceable">service</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">users</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">end-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p25">Returns the messages posted by anyone on “service” between the specified start and end dates.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/all-tweets/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">end-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p26">Returns the messages posted by anyone on any service between the specified start and end dates.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/all-tweets/<em class="replaceable">
                              <tt class="replaceable">service</tt>
                           </em>/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p27">Returns the messages posted by anyone on “service” between the specified start date and today.</p>
                     </dd>
                     <dt>
                        <tt class="uri">/all-tweets/<em class="replaceable">
                              <tt class="replaceable">start-date</tt>
                           </em>
                        </tt>
                     </dt>
                     <dd>
                        <p id="p28">Returns the messages posted by anyone on any service between the specified start date and today.</p>
                     </dd>
                  </dl>
               </div>
               <p id="p29">The URL rewriter is just an XQuery that you can change; so it can support any kind of values that you'd like. It recieves the URL that the user entered. Whatever URL it returns is what the server actually responds to. The one I've provided turns</p>
               <div class="screen">
                  <pre xml:space="preserve">
/all-tweets/2009-08-15/2009-08-31
</pre>
               </div>
               <p id="p30">into</p>
               <div class="screen">
                  <pre xml:space="preserve">
/show-tweets.xyq?users=ALL&amp;sdate=2009-08-15&amp;edate=2009-08-31
</pre>
               </div>
               <p id="p31">So nothing else has to change about the code, it all just works.</p>
               <p id="p32">I'm not sure there's much else that's new or interesting about the other modules provided in this part. If you have any questions, feel free to ask.</p>
               <p id="p33">Next time, we'll look at dealing with all those ugly shortened URLs and pesky replies that extend beyond our followers.</p>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Micro-blogging Backup, part the third</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/09/03/mbb03"/>
      <id>http://norman.walsh.name/2009/09/03/mbb03</id>
      <published>2009-09-03T20:12:40Z</published>
      <updated>2009-09-05T17:47:11Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <category term="microblogging" scheme="http://technorati.com/tag/"/>
      <dc:subject>Microblogging</dc:subject>
      <category term="www" scheme="http://technorati.com/tag/"/>
      <dc:subject>TheWeb</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>In which we peel back the covers on what's been built so far.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/09/03/mbb03">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>In which we peel back the covers on what's been built so far.</p>
            </div>
            <p id="p1">There's more functionality to come, but first, I thought it might be useful to spend a few minutes looking at what we've got so far.</p>
            <p id="p2">The setup code in <tt class="filename">/mbb/init</tt> isn't very interesting, and I'm not going to attempt to explain how CQ works, so we'll begin in the <tt class="filename">/mbb/modules</tt> directory.</p>
            <div class="variablelist">
               <dl>
                  <dt>
                     <tt class="filename">accounts.xqy</tt>
                  </dt>
                  <dd>
                     <p id="p3">This module contains some utility and convenience functions for dealing with account data. I changed my mind about how to store the data a couple of times early on, so these functions were supposed to protect me a little bit from that. I didn't follow through all the way so the account abstraction is pretty leaky, but I left this module in place anyway.</p>
                  </dd>
                  <dt>
                     <tt class="filename">twitter.xqy</tt>
                  </dt>
                  <dd>
                     <p id="p4">This module is a thin skin over the actual <a href="http://apiwiki.twitter.com/Twitter-API-Documentation" shape="rect">Twitter API</a>. Ideally, I'd flesh this module out to support the rest of the endpoints, but I haven't bothered yet.</p>
                     <p id="p5">One school of thought on this kind of API module is that it should be as thin as possible, providing only the thinest skin over the underlying API. I mostly agree, but I did take a few liberties. If you wanted to adapt this module for some other purpose, you might have reason to carve it a little closer to the bone.</p>
                     <p id="p6">One decision I made was to have the <tt class="methodname">account/rate_limit_status</tt> method return the number of calls remaining directly as a number, rather than returning the XML response. That's pretty simple. The other changes I made are a bit deeper.</p>
                     <p id="p7">The Twitter timeline methods are designed to be “paged”; the caller can select the page size and the page they want to retreive. I decided that what I really want is <em>all</em> the pages; so my versions of the timeline methods always request all the pages and return all of the results in a single call (by performing the requisite paging for you, behind the scenes). Twitter limits you to 16 pages, but Identi.ca servers seem to offer more pages. In order to avoid recursing beyond the size of the
call stack, I placed an arbitrary limit on the number of pages.</p>
                     <p id="p8">Finally, I decided to protect the caller from exceptions that can occur if the underlying HTTP requests fail. Most of the public methods in <tt class="filename">twitter.xqy</tt> return an element, either the Twitter API response, or a <tt class="tag-starttag">&lt;t:error&gt;</tt> element containing the HTTP error code if an error occurred.</p>
                     <p id="p9">I think an argument could be made for <em>not</em> doing this, for letting the lowest-level API calls throw the exception, but I decided not to. You're free to change that, of course.</p>
                  </dd>
                  <dt>
                     <tt class="filename">twitproc.xqy</tt>
                  </dt>
                  <dd>
                     <p id="p10">This module is mostly responsible for taking Twitter <tt class="tag-starttag">&lt;status&gt;</tt> and <tt class="tag-starttag">&lt;user&gt;</tt> elements and inserting them into the database. Along the way, we transform them just a little:</p>
                     <div class="orderedlist">
                        <ol style="list-style: decimal;">
                           <li>
                              <p id="p11">I move them from no namespace into the “t:” namespace. First, I subscribe to the position that XML vocabularies <a href="http://www.w3.org/TR/webarch/#use-namespaces" shape="rect">should place elements in a namespace</a>. I'm aware that there are people who believe otherwise. They're wrong. Second, <a href="http://en.wikipedia.org/wiki/XQuery" title="Wikipedia: XQuery"
                                    shape="rect">XQuery</a>’s interpretation of unqualified names <a href="http://norman.walsh.name/2008/07/02/xquery#p11" shape="rect">exacerbates the problem</a>. So
you could look at this as patching a bug in the Twitter API.</p>
                           </li>
                           <li>
                              <p id="p12">I transform the contents of the <tt class="tag-starttag">&lt;created_at&gt;</tt> element into ISO 8601 format (so it fits more naturally into the data model).</p>
                           </li>
                           <li>
                              <p id="p13">The Twitter APIs return a <tt class="tag-starttag">&lt;user&gt;</tt> element embedded in each <tt class="tag-starttag">&lt;status&gt;</tt> message. This is probably a net win for limiting round-trip calls to the API, but it doesn't strike me as a very sensible way to store things in the database. I break out the users and store them separately.</p>
                           </li>
                           <li>
                              <p id="p14">I add a few more elments to each status message. These record information about subsequent processing to perform (more about that later), the screen name of the user who uttered the message, and information about who was logged in to retreive this message.</p>
                              <p id="p15">This is a little lazy on my part. Arguably, I should introduce another namespace for these additional elements (so that some future Twitter API change doesn't walk all over them), or maybe not store them <em>in</em> the messages at all. I invite you to fix it if it bothers you.</p>
                              <p id="p16">If you know something about <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a>, this may sound like a job for document properties. That's a good idea, particularly for the downstream processing markers. However, document properties are associated, as the name suggests, with <em>documents</em> in the database. Later on in the code for displaying messages, we're sometimes going to make a copy of the message (giving it a new parent element). Doing that breaks the
association with document properties. I was trying to keep things simple, so I didn't use properties for one set of information and child nodes for another, I just pushed it all into child nodes. My bad.</p>
                           </li>
                        </ol>
                     </div>
                  </dd>
                  <dt>
                     <tt class="filename">update.xqy</tt>
                  </dt>
                  <dd>
                     <p id="p17">This module wraps up the functionality of the <tt class="filename">twitter.xqy</tt> and <tt class="filename">twitproc.xqy</tt> modules, getting all the tweets for a user and inserting them into the database. The code for finding the most recent messages by (and not by) a particular user might be interesting to you. Ignore the <tt class="varname">$tweet-collection</tt> variable; it's a holdover from an earlier approach, no longer used.</p>
                  </dd>
                  <dt>
                     <tt class="filename">get-new-tweets.xqy</tt>
                  </dt>
                  <dd>
                     <p id="p18">This module exists only to be invoked from another module. It declares an external variable that identifies a single account then simply calls the <tt class="function">get-tweets</tt> function from the <tt class="filename">update.xqy</tt> module for that account.</p>
                  </dd>
               </dl>
            </div>
            <p id="p19">The last bit of code that we've got so far is <tt class="filename">get-tweets.xqy</tt> in the top level of the application server. This module loops over all the accounts that we've defined and, for each one, downloads and inserts any new status messages into the database. It does this by invoking the <tt class="filename">get-new-tweets.xqy</tt> module.</p>
            <div class="section">
               <h2 class="runin">What's all this invoking stuff about? </h2>
               <p class="runin" id="p20">
                  <a id="invoke" name="invoke" shape="rect"/>The server takes a completely safe, <a href="http://en.wikipedia.org/wiki/ACID" title="Wikipedia: ACID" shape="rect">transactional</a> approach to database updates. You are guaranteed that every query that updates the database either succeeds in its entirety or fails. One of the things that you aren't allowed to do is make conflicting updates to the same document in the same transaction. You can demonstrate this easily, just run the following expression in
CQ:</p>
               <div class="programlisting">
                  <pre xml:space="preserve">
let $doc := &lt;foo&gt;some document&lt;/foo&gt;
return
  (xdmp:document-insert("/scratch/foo", $doc),
   xdmp:document-insert("/scratch/foo", $doc))
</pre>
               </div>
               <p id="p21">The server will bark “XDMP-CONFLICTINGUPDATES” and no inserts will be made to the database.</p>
               <p id="p22">Why does this matter to us? Well, imagine that you setup two Twitter accounts in our micro-blogging backup system. Imagine further that both of those accounts follow <a href="http://twitter.com/marklogic" shape="rect">marklogic</a>.</p>
               <p id="p23">What's going to happen when we run the backup? Both accounts are going to download all of the status messages on their “friends” timeline, so they're both going to download all of the recent <a href="http://twitter.com/marklogic" shape="rect">marklogic</a> tweets. And they're both going to try to insert them into the database. And that's going to generate a “conflicting updates” error.</p>
               <p id="p24">Using the two-step <tt class="function">xdmp:invoke</tt> dance as shown in <tt class="filename">get-tweets.xqy</tt> and <tt class="filename">get-new-tweets.xqy</tt> avoids this problem. The semantics of <tt class="function">xdmp:invoke</tt> are that it runs the specified module in a <em>separate</em> transaction.</p>
               <p id="p25">Since no single user is going to download the same message twice, each transaction will succeed. In fact, some messages will get updated twice in the database, but that doesn't do any harm because the content of the message will be the same in each case.</p>
               <p id="p26">An alternate approach to this problem is to manage the messages with greater care, identifying duplicates when they occur and not attempting to insert them in the database. This is the approach taken in <tt class="filename">twitproc.xqy</tt> for the simpler problem of dealing with duplicate <tt class="tag-starttag">&lt;user&gt;</tt>s.</p>
               <p id="p27">It would certainly be possible to refactor the code so that the <tt class="function">xdmp:invoke</tt> call could be avoided, but in this case splitting work into several transactions feels like the more elegant solution. And any performance penalties associated with a few calls to <tt class="function">xdmp:invoke</tt> are going to be totally swamped by the latency in the underlying HTTP requests, so there isn't really a downside.</p>
            </div>
            <div class="section">
               <h2 class="runin">What next? </h2>
               <p class="runin" id="p28">
                  <a id="next" name="next" shape="rect"/>In the next part, we'll push a little further forward, getting some code in place to display the messages we've downloaded. We'll also look at the subsequent processing I hinted at. Further down the road, we'll look at search, and then we'll add some <a href="http://en.wikipedia.org/wiki/JavaScript" title="Wikipedia: JavaScript"
                     shape="rect">JavaScript</a>
                  <a href="/knows/what/Javascript" shape="rect">
                     <img border="0" alt="[L]" src="/graphics/linkgroup.gif"/>
                  </a>, refactor
things a bit, and make an AJAXy/Web 2.0 UI for our application.</p>
               <p id="p29">I hope you're enjoying the ride.</p>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>XML+XQuery+Google Voice+Python=WIN!</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/09/01/gvcall"/>
      <id>http://norman.walsh.name/2009/09/01/gvcall</id>
      <published>2009-09-01T13:45:54Z</published>
      <updated>2009-09-01T14:38:35Z</updated>
      <category term="googlevoice" scheme="http://technorati.com/tag/"/>
      <dc:subject>GoogleVoice</dc:subject>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <category term="rdf" scheme="http://technorati.com/tag/"/>
      <dc:subject>RDF</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>It's finally possible to put all the pieces together.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/09/01/gvcall">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>It's finally possible to put all the pieces together.</p>
            </div>
            <p id="p1">I've been storing my <a href="http://en.wikipedia.org/wiki/Personal_information_manager"
                  title="Wikipedia: Personal information manager"
                  shape="rect">PIM</a> data (contacts, appointments, etc.) in XML <a href="http://norman.walsh.name/2005/12/16/pimExample" title="PIM Example"
                  shape="rect">for ages</a> (and I do mean <a href="http://nwalsh.com/docs/presentations/extreme2002/" shape="rect">
                  <em>ages</em>
               </a>!). Since my <a href="http://en.wikipedia.org/wiki/Palm_%28PDA%29"
                  title="Wikipedia: Palm (PDA)"
                  shape="rect">Palm</a> days, I've been
translating whatever native format my PIM supports into XML. Where necessary (i.e. everywhere), I've used an RDF/N3-like annotation mechanism to support additional metadata.</p>
            <p id="p2">For example, the “notes” field for the XProc telcon looks like this:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
rdf:
p:class telcon
p:access public
p:phone #w3c-zakim
p:code 97762#
</pre>
            </div>
            <p id="p3">That gets parsed into the obvious XML by the conversion process:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;p:class&gt;telcon&lt;/p:class&gt;
&lt;p:access&gt;public&lt;/p:access&gt;
&lt;p:phone&gt;#w3c-zakim&lt;/p:phone&gt;
&lt;p:code&gt;97762#&lt;/p:code&gt;
</pre>
            </div>
            <p id="p4">(The phone number “<tt class="literal">#w3c-zakim</tt>” means that there's an entry in my address book with the ID “<tt class="literal">w3c-zakim</tt>”. Why do none of the PIM applications understand that appointments and contacts are related!?)</p>
            <p id="p5">I've been doing this for years <em>because it's The Right Thing™</em>, even though I've only been able to wring small (er, tiny, perhaps miniscule) amounts of practical value from it.</p>
            <p id="p6">But no more!</p>
            <p id="p7">The XML data is stored in my own <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> instance (moving from a collection of <a href="http://en.wikipedia.org/wiki/Perl" title="Wikipedia: Perl" shape="rect">Perl</a> hacks to the server was one of my first personal projects after I joined <a href="http://www.marklogic.com" shape="rect">Mark Logic</a>). I now have a <a href="http://www.google.com/googlevoice/about.html" shape="rect">Google Voice</a> number. And <span class="personname">
                  <span class="firstname">Scott</span> 
                  <span class="surname">Hillman</span>
               </span>’s <a href="http://everydayscripting.blogspot.com/2009/08/python-google-voice-mass-sms-and-mass.html"
                  shape="rect">Python scripts</a> finally let me connect all the dots<sup class="footnote">[<a name="p7.7" href="#ftn.p7.7" id="p7.7" shape="rect">1</a>]</sup>!</p>
            <p id="p9">A little <a href="http://en.wikipedia.org/wiki/Python_%28programming_language%29"
                  title="Wikipedia: Python (programming language)"
                  shape="rect">Python</a> hacking, a quick <a href="http://en.wikipedia.org/wiki/XQuery" title="Wikipedia: XQuery"
                  shape="rect">XQuery</a> module, and I can make calls from a shell window.</p>
            <div class="screen">
               <pre xml:space="preserve">
$ call xproc
W3C XProc WG
+1-617-761-6200 97762#

Dialing +1-617-761-6200...
</pre>
            </div>
            <p id="p10">The call query searches both the appointments on today's calendar and the address book. That means “<strong class="command">call seth</strong>” works equally well, and calls my boss. For contacts with more than one phone number, I can add “<tt class="literal">-<em class="replaceable">
                     <tt class="replaceable">phone</tt>
                  </em>
               </tt>” on the end: “<strong class="command">call ndw@nwalsh.com -home</strong>” would call me at home, should I ever want to do that.</p>
            <p id="p11">It's a tiny little thing, but it feels <em>great</em>.</p>
            <p id="p12">I'm easily amused, I know.</p>
            <p id="p13">Hmm. I should add an option to send SMS messages, too…</p>
            <div class="footnotes">
               <hr width="100" align="left" class="footnotes-divider"/>
               <div class="footnote">
                  <p id="p8">
                     <sup>[<a href="#p7.7" name="ftn.p7.7" id="ftn.p7.7" shape="rect">1</a>]</sup>Well, technically, reconnect, but for all it's coolness, I never got much use out of <a href="http://norman.walsh.name/2003/11/05/tel" title="Automatic Dialing"
                        shape="rect">the DTMF auto dialer</a>.</p>
               </div>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Micro-blogging Backup, part the second</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/08/28/mbb02"/>
      <id>http://norman.walsh.name/2009/08/28/mbb02</id>
      <published>2009-08-28T18:16:44Z</published>
      <updated>2009-08-28T19:50:25Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <category term="microblogging" scheme="http://technorati.com/tag/"/>
      <dc:subject>Microblogging</dc:subject>
      <category term="www" scheme="http://technorati.com/tag/"/>
      <dc:subject>TheWeb</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>In which we setup the database one screen at a time and then import our first status messages.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/08/28/mbb02">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>In which we setup the database one screen at a time and then import our first status messages.</p>
            </div>
            <p id="p1">If you were following along <a href="http://norman.walsh.name/2009/08/27/mbb01"
                  title="Micro-blogging Backup, part the first"
                  shape="rect">yesterday</a>, you've got <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a> up and running with the <a href="http://developer.marklogic.com/about/whatiscis.xqy#community"
                  shape="rect">Community License</a>. Now it's time to start putting it to work. (Cutting toothpicks with a chainsaw, but hey, you have to start somewhere.)</p>
            <p id="p2">Well, almost. First, we have to do a little setup.</p>
            <p id="p3">Download <a href="examples/mbb02.zip" shape="rect">mbb02.zip</a> and unpack it somewhere convenient. I choose <tt class="filename">/home/ndw/mbb</tt> for the purposes of this example, but I'm not sure your home directory is really the best place. Anywhere you'd like though, doesn't matter to me.</p>
            <p id="p4">Fire up your favorite web browser and connect to the admin interface on port 8001 (<a href="http://localhost:8001/" shape="rect">http://localhost:8001/</a>, probably); you'll need to login with whatever userid/password combination you selected at installation time.</p>
            <p id="p5">Once you're there, click “Forests” in the “Configure” tree control in the left hand column and then select the “Create” tab. Enter any name you'd like for the forest and click “ok”. I named mine “mbb”.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3865078512/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm4.static.flickr.com/3186/3865078512_ee28a5cdb4.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Create a forest</h3>
               </div>
            </div>
            <p id="p6">Forests are where the server stores XML documents. Trees, as it were. Clever, eh?</p>
            <p id="p7">Next, choose “Databases” in the tree control and select the “Create” tab again. Enter any name you'd like for the database and click “ok”. I named mine “mbb”. I can't think of a compelling reason to give them different names, but suit yourself.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3864296197/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2666/3864296197_8b9abcd82d.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Create a database</h3>
               </div>
            </div>
            <p id="p8">Once you've created a database, you'll be reminded that you need to attach a forest to the database.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3865078752/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2476/3865078752_c0be6ca29b.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>You must attach a forest to the database</h3>
               </div>
            </div>
            <p id="p9">Click on that link and do so. Remember to click “ok”.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3864296397/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2576/3864296397_fc437a339d.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Attach the forest you created</h3>
               </div>
            </div>
            <p id="p10">Almost there. Choose “Groups”, “Default”, and “App Servers” in the tree control, then select the “Create HTTP” tab. Enter any name you'd like for the server name, I named mine “mbb”; enter the location where you unpacked the zip file for the root, I used <tt class="filename">/home/ndw/mbb</tt>; and enter an open port value for the port, I used “8330”.</p>
            <p id="p11">But <em>don't</em> click “ok” just yet. (If you already did, no worries, just click on the app server's name in the tree control.)</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3865078948/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2533/3865078948_347c5a8d72.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Create an HTTP application server</h3>
               </div>
            </div>
            <p id="p12">Scroll about half way down the page to change the authentication and default user. Select “application level” for the authentication scheme and “admin” for the default user.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3865079022/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2605/3865079022_9250372166.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Change the authentication to application-level</h3>
               </div>
            </div>
            <p id="p13">This gives your application complete access to the server without having to login. There are lots of ways to make an application more secure, but let's leave all the security knobs for another day. Now scroll to the top or bottom and click “ok”.</p>
            <p id="p14">At this point, you have a real honest-to-goodness application running on your server. (And yeah, this should all be simpler and easier. I've heard tell of plans to improve it, but nothing I can swear to.)</p>
            <p id="p15">I included a copy of “CQ”, a browser-based, interactive XQuery environment in the distribution. You can see it if you navigate your browser to <a href="http://localhost:8330/cq" shape="rect">http://localhost:8330/cq</a>. (In this and all the following examples, if you chose a different port, use the port number you chose.)</p>
            <p id="p16">If you click on the “explore” link at the top of the CQ page, you'll see that you've got an empty database.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3865079102/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm4.static.flickr.com/3503/3865079102_6394379522.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>CQ shows the empty database</h3>
               </div>
            </div>
            <p id="p17">Now it's time to configure this particular database for our micro-blogging backup application. Later on, we're going to need some indexes. You could walk through the admin UI to create them, but that's tedious, you only have to do this once, and the admin UI is completely scriptable, so I created a little query to do the grunt work.</p>
            <p id="p18">Point your web browser at the database configuration script: <a href="http://localhost:8330/init/setup-database.xqy" shape="rect">http://localhost:8330/init/setup-database.xqy</a>. If everything is setup correctly, you'll quickly get a “database configured” message.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3865079144/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2467/3865079144_9dabcc9ed1.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Configure the database</h3>
               </div>
            </div>
            <p id="p19">Next, we need to configure the microblogging accounts that you want to backup. Like database configuration, you're probably only going to do this once (or at least once in a great while), so I didn't create any sort of UI for it.</p>
            <p id="p20">In the directory where you unpacked <tt class="filename">mbb02.zip</tt>, open up <tt class="filename">init/setup-accounts.xqy</tt> with your favorite text editor. On lines 57 and 58 replace <tt class="literal">SCREEN_NAME</tt> and <tt class="literal">PASSWORD</tt> with the <a href="http://twitter.com/" shape="rect">Twitter</a> username and password that you want to backup.</p>
            <p id="p21">If you're using <a href="http://identi.ca/" shape="rect">Identi.ca</a> instead, you'll have to do a little more editing, but it should be pretty straightfoward. If you're using your own install of the Laconica software, or you're using some other microblogging server, as long as it supports the Twitter API, you should be able to figure out what to do. Feel free to ask if you're not sure.</p>
            <p id="p22">When you've got all your accounts in place, save the file and point your web browser at it: <a href="http://localhost:8330/init/setup-accounts.xqy" shape="rect">http://localhost:8330/init/setup-accounts.xqy</a>.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3865079190/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm4.static.flickr.com/3493/3865079190_077522b8dc.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Configure your accounts</h3>
               </div>
            </div>
            <p id="p23">If all goes well, you'll get an appropriate “Accounts initialized” message. If you get 500 errors, you messed up the XQuery syntax somewhere. It won't do any harm to run the setup account script more than once, so try making small changes, running the script after each change. If you get stuck, let me know.</p>
            <p id="p24">If you go back to CQ again and click the “explore” link, you'll see that there are documents in the database now, one for each account you added.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3864296927/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm4.static.flickr.com/3452/3864296927_3f17eaca53.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>CQ shows the database with one document</h3>
               </div>
            </div>
            <p id="p25">Now we're ready to <em>really</em> do something.</p>
            <p id="p26">Point your web browser at <a href="http://localhost:8330/get-tweets.xqy" shape="rect">http://localhost:8330/get-tweets.xqy</a> to download your status messages. This may take a while, especially the first time and especially if you entered several accounts.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3864297111/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm4.static.flickr.com/3239/3864297111_18b9b23031.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Download the status messages for your account(s)</h3>
               </div>
            </div>
            <p id="p27">If you get a message about “rate limit exceeded”, it means you've done too many interactions with the Twitter API this hour. Wait a bit and try again. Twitter threatens that they'll turn off your account if you flagrantly violate the rate limit, so the MBB queries are pretty careful not to.</p>
            <p id="p28">The “explore” link in CQ will now show a whole bunch of documents in the database.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3864297285/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm3.static.flickr.com/2643/3864297285_3d60b8eba4.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>CQ shows a database full of documents</h3>
               </div>
            </div>
            <p id="p29">You can enter any arbitrary XQuery expressions you'd like into CQ. Here I've asked for a count of all the messages that I've “favorited”.</p>
            <div class="artwork">
               <div class="flickr-photo">
                  <div class="photo" style="width: 500px">
                     <a href="http://www.flickr.com/photos/ndw/3864297405/" shape="rect">
                        <img border="0" alt="[Photo]"
                             src="http://farm4.static.flickr.com/3251/3864297405_f7c7e9ea74.jpg"/>
                     </a>
                  </div>
                  <div class="link" style="left: 225px;">
                     <a href="http://www.flickr.com/" shape="rect">
                        <img border="0" alt="[Flickr]" src="/graphics/flickrt.png"/>
                     </a>
                  </div>
                  <h3>Arbitrary XQuery expressions evaluated by CQ</h3>
               </div>
            </div>
            <p id="p30">In the next parts, we'll look at some of the code behind this functionality in a little more detail, add some XQuery to display the messages, look at how we can augment the messages in useful ways, add searching, and finally pull the pieces together into a useful little app. Well, a little app I think is useful, anyway.</p>
            <div class="section">
               <h2 class="runin">What about my older messages? </h2>
               <p class="runin" id="p31">
                  <a id="old" name="old" shape="rect"/>Twitter only lets you get at the last 3,200 or so status messages with the Twitter API. If you've got older status messages that you've already backed up, or if you can find some other API to get at them, there are other ways to get them in the database.</p>
               <p id="p32">I left the skeleton of one of those ways in the <tt class="filename">init</tt> directory, an XProc pipeline that <a href="http://xmlcalabash.com/" shape="rect">XML Calabash</a> can run to load status messages from existing XML files.</p>
               <p id="p33">If you've got your old tweets archived in XML, drop me a line and I'll try to point you in the right direction.</p>
            </div>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Micro-blogging Backup, part the first</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/08/27/mbb01"/>
      <id>http://norman.walsh.name/2009/08/27/mbb01</id>
      <published>2009-08-27T13:23:47Z</published>
      <updated>2009-08-27T16:02:44Z</updated>
      <category term="marklogic" scheme="http://technorati.com/tag/"/>
      <dc:subject>MarkLogic</dc:subject>
      <category term="microblogging" scheme="http://technorati.com/tag/"/>
      <dc:subject>Microblogging</dc:subject>
      <category term="www" scheme="http://technorati.com/tag/"/>
      <dc:subject>TheWeb</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>What started out as a trivial exercise in backing up my Twitter and Identi.ca posts turned into a little microcosm of XML Server application development. It's something you can deploy for free on your very own MarkLogic Server!</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/08/27/mbb01">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>What started out as a trivial exercise in backing up my Twitter and Identi.ca posts turned into a little microcosm of XML Server application development. It's something you can deploy for free on your very own MarkLogic Server!</p>
            </div>
            <p id="p1">This is the story of the intersection of two ideas:</p>
            <div class="orderedlist">
               <ol style="list-style: decimal;">
                  <li>
                     <p id="p2">First, almost no one that I spoke to at <a href="http://balisage.net/" shape="rect">Balisage</a> had heard of the <a href="http://developer.marklogic.com/about/whatiscis.xqy#community"
                           shape="rect">Community License</a> for <a href="http://www.marklogic.com/product/marklogic-server.html" shape="rect">MarkLogic Server</a>, and those few who had thought that it was still limited to just 100Mb of content.</p>
                     <p id="p3">The fact that you can download and play with the best XML server on the planet is something more people should know about! The community license is for non-commercial use only but it's free and it never expires. The previous 100Mb content limit has been upped to 10Gb so there's a lot more room in the sandbox now.</p>
                  </li>
                  <li>
                     <p id="p4">Second, at about the same time, there was a little spike of interest in backing up microblogging data, the status messages that you send to services like <a href="http://twitter.com/" shape="rect">Twitter</a> or <a href="http://identi.ca/" shape="rect">Identi.ca</a>.</p>
                     <p id="p5">Sturgeon's law applies, of course, to microblogging. And Sturgeon was an optimist. But there's still a lot of useful information out there and I don't want it to disappear under the waves just because some acquisition occurs and the terms of service shift under my feet.</p>
                  </li>
               </ol>
            </div>
            <p id="p6">Luckily, the APIs for getting your microblogging content return XML and I have an XML server, so… my first thought was to download the tweets (a “<strong class="command">for</strong>” loop in <a href="http://en.wikipedia.org/wiki/Bash" title="Wikipedia: Bash" shape="rect">Bash</a> and <a href="http://en.wikipedia.org/wiki/Wget" title="Wikipedia: Wget" shape="rect">wget</a> will do the trick) and store them in the server. Then I thought, that's silly, the server can download them for me…</p>
            <p id="p7">From there, my little ten minute exercise grew until I had a (still relatively small) appication that handles oodles of documents from multiple services and accounts, has threaded conversations and account merging, uses indexes, has full-text and faceted search, employs web APIs, uses URI rewriting, and even has some AJAX.</p>
            <p id="p8">And because status messages are small, it'll run for <em>ages</em> under the community license.</p>
            <p id="p9">My plan, therefore, is to spin this out over a few essays, building the app from its barest bones to something I'm finding quite useful. If you want to play along, the first step is to go get a copy of MarkLogic Server and install it with the community license. The steps are roughly these:</p>
            <div class="orderedlist">
               <ol style="list-style: decimal;">
                  <li>
                     <p id="p10">
                        <a href="http://dev.marklogic.com/download/" shape="rect">Download</a> version 4.1 of the server. It runs on Windows, Linux, and Sparc boxes. (It isn't, alas, available for <a href="http://en.wikipedia.org/wiki/Mac_OS_X" title="Wikipedia: Mac OS X"
                           shape="rect">OS X</a>, but it runs just fine under virtualization.)</p>
                     <p id="p11">It also <a href="http://strangelylooping.wordpress.com/2009/06/14/marklogic-server-on-ubuntu-9-04/"
                           shape="rect">runs just fine</a> on the <a href="http://en.wikipedia.org/wiki/Debian" title="Wikipedia: Debian"
                           shape="rect">Debian</a> flavors of <a href="http://en.wikipedia.org/wiki/Linux" title="Wikipedia: Linux" shape="rect">Linux</a>, though that's not an officially supported platform. Just make sure you have the <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519817" shape="rect">bugfixed version</a> of <span class="package">lsb-base</span>.</p>
                  </li>
                  <li>
                     <p id="p12">After it's installed and running, point your web browser at <tt class="uri">http://localhost:8001/</tt> on the machine where you installed it.</p>
                  </li>
                  <li>
                     <p id="p13">Click on the “free” license button, choose the community license, and click your way through the rest of the install screens.</p>
                  </li>
               </ol>
            </div>
            <p id="p14">Congratulations! You know have the most powerful XML chainsaw imaginable at your fingertips. Exactly what to do with it is the subject of part the second and beyond.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
   <entry>
      <title>Using XML Catalogs and XProc together</title>
      <link rel="alternate" type="text/html"
            href="http://norman.walsh.name/2009/07/22/xmlCatalogsandXProc"/>
      <id>http://norman.walsh.name/2009/07/22/xmlCatalogsandXProc</id>
      <published>2009-07-22T20:15:27Z</published>
      <updated>2009-07-22T20:51:31Z</updated>
      <category term="calabash" scheme="http://technorati.com/tag/"/>
      <dc:subject>Calabash</dc:subject>
      <category term="xmlcatalogs" scheme="http://technorati.com/tag/"/>
      <dc:subject>XMLCatalogs</dc:subject>
      <category term="xproc" scheme="http://technorati.com/tag/"/>
      <dc:subject>XProc</dc:subject>
      <summary type="xhtml">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <p>XML Calabash, my implementation of XProc, is my go-to tool these days for manipulating XML documents. Adding XML Catalogs into the mix just makes it sweeter.</p>
         </div>
      </summary>
      <content type="xhtml" xml:base="http://norman.walsh.name/2009/07/22/xmlCatalogsandXProc">
         <div xmlns="http://www.w3.org/1999/xhtml">
            <div class="abstract">
               <p>XML Calabash, my implementation of XProc, is my go-to tool these days for manipulating XML documents. Adding XML Catalogs into the mix just makes it sweeter.</p>
            </div>
            <p id="p1">Recently, I was presented with several hundred books comprised of many thousands of chapters. My goal: load them into the server so that they could become part of a larger application. Easy peasy.</p>
            <p id="p2">Two snags: all the chapters contained references to named entities declared in an external subset and none of the metadata in each file was actually reliable.</p>
            <p id="p3">Still pretty straight-forward. Parse the document to expand the entity references, do a little cleanup, and push them into the database. The details of the pipeline aren't that important, the bit I want to highlight today is the parsing.</p>
            <p id="p4">Everything remained pretty easy until I discovered that there were a half-dozen or more flavors of DTD in use across this corpus. And naturally, every external subset was referenced <em>only</em> by a system identifier with some random, absolute path:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;!DOCTYPE chapter SYSTEM "/path/to/dtd10.dtd"&gt;
</pre>
            </div>
            <p id="p5">Where “10” was “10”, “11”, “21”, “25”, etc. for some substantial enough set of versions to be go well beyond my limit for tedium.</p>
            <p id="p6">Luckily, all of them were including a standard suite of ISO entities and (as far as I could easily tell), that's all the entity references ever were.</p>
            <p id="p7">XML Catalogs to the rescue.</p>
            <p id="p8">First, grab a recent version of the DTD and stick it somewhere local, then construct the following catalog:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"&gt;
  &lt;systemSuffix systemIdSuffix=".dtd" uri="local/dtd21.dtd"/&gt;
&lt;/catalog&gt;
</pre>
            </div>
            <p id="p9">Next, tell <a href="http://norman.walsh.name/2008/projects/calabash"
                  title="XML Calabash: an XProc implementation"
                  shape="rect">XML Calabash</a> to use catalogs. You can do this from the command line, but I set it up in my configuration file, <tt class="filename">~/.calabash</tt>:</p>
            <div class="programlisting">
               <pre xml:space="preserve">
&lt;cc:xproc-config xmlns:cc="http://xmlcalabash.com/ns/configuration"&gt;
  &lt;cc:schema-aware&gt;false&lt;/cc:schema-aware&gt;
  &lt;cc:log-level level="warning"/&gt;
  &lt;cc:serialization
      omit-xml-declaration="false"/&gt;
  &lt;cc:entity-resolver class-name="org.xmlresolver.Resolver"/&gt; 
  &lt;cc:uri-resolver class-name="org.xmlresolver.Resolver"/&gt;
&lt;/cc:xproc-config&gt;
</pre>
            </div>
            <p id="p10">The first few lines just set some defaults I like, it's the last two that are relevant here. I tell <em class="citetitle">XML Calabash</em> to use my <a href="http://xmlresolver.org/" shape="rect">XML Resolver</a> catalog implementation for entity and URI resolution.</p>
            <p id="p11">Now my pipeline simply does The Right Thing™.</p>
            <p id="p12">When the parser attempts to load the external subset, the catalog resolver returns the local DTD (because all the system identifiers end with “<tt class="literal">.dtd</tt>”). The <tt class="tag-starttag">&lt;p:load&gt;</tt> step doesn't do validation by default, so the fact that some of the files aren't valid according to the particular version of the DTD that I have locally doesn't matter. The entities get expanded correctly. (If any of the documents had relied on other entities only present
in a particular version of the DTD, that would have been an error, so I know I didn't miss any.) I do a couple of lightweight transformations on the resulting document and shove it into the database FTW!</p>
            <p id="p13">Nothing earth shattering here, and not the only way to solve the problem, but one that looks like a nail to my particular hammer of choice at the moment.</p>
            <div id="newcomment"/>
            <div class="footer"/>
         </div>
      </content>
   </entry>
</feed>