<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
<title>Micro-blogging Backup, part the first</title><biblioid class="uri">http://norman.walsh.name/2009/08/27/mbb01</biblioid>
<volumenum>12</volumenum>
<issuenum>25</issuenum>
<pubdate>2009-08-27T09:23:47-04:00</pubdate>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2009</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>What started out as a trivial exercise in backing up my Twitter and
Identi.ca posts turned into a little microcosm of XML Server application
development. It's something you can deploy for free on your very
own MarkLogic Server!</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#MarkLogic"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Microblogging"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#TheWeb"/>
</info>

<para xml:id="p1">This is the story of the intersection of two ideas:</para>

<orderedlist>
<listitem>
<para xml:id="p2">First, almost no one that I spoke to at 
<link xlink:href="http://balisage.net/">Balisage</link> had heard of
the
<link xlink:href="http://developer.marklogic.com/about/whatiscis.xqy#community">Community License</link> for 
<link xlink:href="http://www.marklogic.com/product/marklogic-server.html">MarkLogic
Server</link>, and those few who had thought that it was still limited to just
100Mb of content.
</para>
<para xml:id="p3">The fact that you can download and play with the best XML server
on the planet is something more people should know about! The
community license is for non-commercial use only but it's free and it
never expires. The previous 100Mb content limit has been upped to 10Gb
so there's a lot more room in the sandbox now.</para>
</listitem>
<listitem>
<para xml:id="p4">Second, at about the same time, there was a little spike of
interest in backing up microblogging data, the status messages that
you send to services like
<link xlink:href="http://twitter.com/">Twitter</link> or 
<link xlink:href="http://identi.ca/">Identi.ca</link>.</para>
<para xml:id="p5">Sturgeon's law applies, of course, to microblogging. And Sturgeon was
an optimist. But there's still a lot of useful information out there and
I don't want it to disappear under the waves just because some acquisition occurs
and the terms of service shift under my feet.</para>
</listitem>
</orderedlist>

<para xml:id="p6">Luckily, the APIs for getting your microblogging content return XML and
I have an XML server, so…
my first thought was to download the tweets (a “<command>for</command>” loop in
<wikipedia>Bash</wikipedia> and <wikipedia page="Wget">wget</wikipedia>
will do the trick) and store them in the server. Then I thought, that's
silly, the server can download them for me…</para>

<para xml:id="p7">From there, my little ten minute exercise grew until I had a
(still relatively small) appication that handles oodles of documents
from multiple services and accounts, has threaded conversations and
account merging, uses indexes, has full-text and faceted search,
employs web APIs, uses URI rewriting, and even has some AJAX.
</para>

<para xml:id="p8">And because status messages are small, it'll run for
<emphasis>ages</emphasis> under the community license.</para>

<para xml:id="p9">My plan, therefore, is to spin this out over a few essays, building the
app from its barest bones to something I'm finding quite useful. If you want
to play along, the first step is to go get a copy of MarkLogic Server and
install it with the community license. The steps are roughly these:</para>

<orderedlist>
<listitem>
<para xml:id="p10"><link xlink:href="http://dev.marklogic.com/download/">Download</link>
version 4.1 of the server. It runs on Windows, Linux, and Sparc boxes.
(It isn't, alas, available for <wikipedia page="Mac_OS_X">OS X</wikipedia>,
but it runs just fine under virtualization.)</para>
<para xml:id="p11">It also
<link xlink:href="http://strangelylooping.wordpress.com/2009/06/14/marklogic-server-on-ubuntu-9-04/">runs just fine</link> on the <wikipedia>Debian</wikipedia> flavors
of <wikipedia>Linux</wikipedia>, though that's not an officially supported
platform. Just make sure you have the
<link xlink:href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519817">bugfixed
version</link> of <package>lsb-base</package>.</para>
</listitem>
<listitem>
<para xml:id="p12">After it's installed and running, point your web browser at
<uri>http://localhost:8001/</uri> on the machine where you installed it.
</para>
</listitem>
<listitem>
<para xml:id="p13">Click on the “free” license button, choose the community license, and
click your way through the rest of the install screens.</para>
</listitem>
</orderedlist>

<para xml:id="p14">Congratulations! You know have the most powerful XML chainsaw imaginable
at your fingertips. Exactly what to do with it is the subject of
part the second and beyond.
</para>

</essay>

