Micro-blogging Backup, part the second

Volume 12, Issue 26; 28 Aug 2009; last modified 08 Oct 2010

In which we setup the database one screen at a time and then import our first status messages.

If you were following along yesterday, you've got MarkLogic Server up and running with the Community License. Now it's time to start putting it to work. (Cutting toothpicks with a chainsaw, but hey, you have to start somewhere.)

Well, almost. First, we have to do a little setup.

Download mbb02.zip and unpack it somewhere convenient. I choose /home/ndw/mbb for the purposes of this example, but I'm not sure your home directory is really the best place. Anywhere you'd like though, doesn't matter to me.

Fire up your favorite web browser and connect to the admin interface on port 8001 (http://localhost:8001/, probably); you'll need to login with whatever userid/password combination you selected at installation time.

Once you're there, click “Forests” in the “Configure” tree control in the left hand column and then select the “Create” tab. Enter any name you'd like for the forest and click “ok”. I named mine “mbb”.

Forests are where the server stores XML documents. Trees, as it were. Clever, eh?

Next, choose “Databases” in the tree control and select the “Create” tab again. Enter any name you'd like for the database and click “ok”. I named mine “mbb”. I can't think of a compelling reason to give them different names, but suit yourself.

Once you've created a database, you'll be reminded that you need to attach a forest to the database.

Click on that link and do so. Remember to click “ok”.

Almost there. Choose “Groups”, “Default”, and “App Servers” in the tree control, then select the “Create HTTP” tab. Enter any name you'd like for the server name, I named mine “mbb”; enter the location where you unpacked the zip file for the root, I used /home/ndw/mbb; and enter an open port value for the port, I used “8330”.

But don't click “ok” just yet. (If you already did, no worries, just click on the app server's name in the tree control.)

Scroll about half way down the page to change the authentication and default user. Select “application level” for the authentication scheme and “admin” for the default user.

This gives your application complete access to the server without having to login. There are lots of ways to make an application more secure, but let's leave all the security knobs for another day. Now scroll to the top or bottom and click “ok”.

At this point, you have a real honest-to-goodness application running on your server. (And yeah, this should all be simpler and easier. I've heard tell of plans to improve it, but nothing I can swear to.)

I included a copy of “CQ”, a browser-based, interactive XQuery environment in the distribution. You can see it if you navigate your browser to http://localhost:8330/cq. (In this and all the following examples, if you chose a different port, use the port number you chose.)

If you click on the “explore” link at the top of the CQ page, you'll see that you've got an empty database.

Now it's time to configure this particular database for our micro-blogging backup application. Later on, we're going to need some indexes. You could walk through the admin UI to create them, but that's tedious, you only have to do this once, and the admin UI is completely scriptable, so I created a little query to do the grunt work.

Point your web browser at the database configuration script: http://localhost:8330/init/setup-database.xqy. If everything is setup correctly, you'll quickly get a “database configured” message.

Next, we need to configure the microblogging accounts that you want to backup. Like database configuration, you're probably only going to do this once (or at least once in a great while), so I didn't create any sort of UI for it.

In the directory where you unpacked mbb02.zip, open up init/setup-accounts.xqy with your favorite text editor. On lines 57 and 58 replace SCREEN_NAME and PASSWORD with the Twitter username and password that you want to backup.

If you're using Identi.ca instead, you'll have to do a little more editing, but it should be pretty straightfoward. If you're using your own install of the Laconica software, or you're using some other microblogging server, as long as it supports the Twitter API, you should be able to figure out what to do. Feel free to ask if you're not sure.

When you've got all your accounts in place, save the file and point your web browser at it: http://localhost:8330/init/setup-accounts.xqy.

If all goes well, you'll get an appropriate “Accounts initialized” message. If you get 500 errors, you messed up the XQuery syntax somewhere. It won't do any harm to run the setup account script more than once, so try making small changes, running the script after each change. If you get stuck, let me know.

If you go back to CQ again and click the “explore” link, you'll see that there are documents in the database now, one for each account you added.

Now we're ready to really do something.

Point your web browser at http://localhost:8330/get-tweets.xqy to download your status messages. This may take a while, especially the first time and especially if you entered several accounts.

If you get a message about “rate limit exceeded”, it means you've done too many interactions with the Twitter API this hour. Wait a bit and try again. Twitter threatens that they'll turn off your account if you flagrantly violate the rate limit, so the MBB queries are pretty careful not to.

The “explore” link in CQ will now show a whole bunch of documents in the database.

You can enter any arbitrary XQuery expressions you'd like into CQ. Here I've asked for a count of all the messages that I've “favorited”.

In the next parts, we'll look at some of the code behind this functionality in a little more detail, add some XQuery to display the messages, look at how we can augment the messages in useful ways, add searching, and finally pull the pieces together into a useful little app. Well, a little app I think is useful, anyway.

What about my older messages?

Twitter only lets you get at the last 3,200 or so status messages with the Twitter API. If you've got older status messages that you've already backed up, or if you can find some other API to get at them, there are other ways to get them in the database.

I left the skeleton of one of those ways in the init directory, an XProc pipeline that XML Calabash can run to load status messages from existing XML files.

If you've got your old tweets archived in XML, drop me a line and I'll try to point you in the right direction.


Who said this man can't write good documentation! Very clear Norm! Thanks.

—Posted by Dave Pawson on 29 Aug 2009 @ 07:04 UTC #

Thanks for fixing the comments, Norm. I only wanted to say that we've pretty successful in actually using a query to set up the forest/database etc, which would give enable you to provide an extra level of automation in the setup.

—Posted by Jeni Tennison on 01 Sep 2009 @ 06:56 UTC #
  xquery version "1.0-ml";
  declare namespace t="http://www.marklogic.com/ns/nwalsh/twitter/tweets";
—Posted by Klortho on 03 Apr 2010 @ 06:17 UTC #

Never mind -- I see that it's a bit of a dumb question. After spending more time with it, I see that "/*", e.g. works by returning all the document-level elements in the database. It seems very powerful -- I just hadn't run across it in the documentation I've read so far.

—Posted by Klortho on 04 Apr 2010 @ 01:56 UTC #