Micro-blogging Backup, part the first

Volume 12, Issue 25; 27 Aug 2009; last modified 08 Oct 2010

What started out as a trivial exercise in backing up my Twitter and Identi.ca posts turned into a little microcosm of XML Server application development. It's something you can deploy for free on your very own MarkLogic Server!

This is the story of the intersection of two ideas:

  1. First, almost no one that I spoke to at Balisage had heard of the Community License for MarkLogic Server, and those few who had thought that it was still limited to just 100Mb of content.

    The fact that you can download and play with the best XML server on the planet is something more people should know about! The community license is for non-commercial use only but it's free and it never expires. The previous 100Mb content limit has been upped to 10Gb so there's a lot more room in the sandbox now.

  2. Second, at about the same time, there was a little spike of interest in backing up microblogging data, the status messages that you send to services like Twitter or Identi.ca.

    Sturgeon's law applies, of course, to microblogging. And Sturgeon was an optimist. But there's still a lot of useful information out there and I don't want it to disappear under the waves just because some acquisition occurs and the terms of service shift under my feet.

Luckily, the APIs for getting your microblogging content return XML and I have an XML server, so… my first thought was to download the tweets (a “for” loop in Bash and wget will do the trick) and store them in the server. Then I thought, that's silly, the server can download them for me…

From there, my little ten minute exercise grew until I had a (still relatively small) appication that handles oodles of documents from multiple services and accounts, has threaded conversations and account merging, uses indexes, has full-text and faceted search, employs web APIs, uses URI rewriting, and even has some AJAX.

And because status messages are small, it'll run for ages under the community license.

My plan, therefore, is to spin this out over a few essays, building the app from its barest bones to something I'm finding quite useful. If you want to play along, the first step is to go get a copy of MarkLogic Server and install it with the community license. The steps are roughly these:

  1. Download version 4.1 of the server. It runs on Windows, Linux, and Sparc boxes. (It isn't, alas, available for OS X, but it runs just fine under virtualization.)

    It also runs just fine on the Debian flavors of Linux, though that's not an officially supported platform. Just make sure you have the bugfixed version of lsb-base.

  2. After it's installed and running, point your web browser at http://localhost:8001/ on the machine where you installed it.

  3. Click on the “free” license button, choose the community license, and click your way through the rest of the install screens.

Congratulations! You know have the most powerful XML chainsaw imaginable at your fingertips. Exactly what to do with it is the subject of part the second and beyond.