<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
<title>Micro-blogging Backup, part the second</title><biblioid class="uri">http://norman.walsh.name/2009/08/28/mbb02</biblioid>
<volumenum>12</volumenum>
<issuenum>26</issuenum>
<pubdate>2009-08-28T14:16:44-04:00</pubdate>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2009</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>In which we setup the database one screen at a time and then
import our first status messages.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#MarkLogic"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#Microblogging"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#TheWeb"/>
</info>

<para xml:id="p1">If you were following along
<link xlink:href="/2009/08/27/mbb01">yesterday</link>, you've got
<link xlink:href="http://www.marklogic.com/product/marklogic-server.html">MarkLogic
Server</link> up and running with the 
<link xlink:href="http://developer.marklogic.com/about/whatiscis.xqy#community">Community License</link>. Now it's time to start putting it to work. (Cutting toothpicks
with a chainsaw, but hey, you have to start somewhere.)</para>

<para xml:id="p2">Well, almost. First, we have to do a little setup.</para>

<para xml:id="p3">Download <link xlink:href="examples/mbb02.zip">mbb02.zip</link> and unpack it
somewhere convenient. I choose <filename>/home/ndw/mbb</filename> for
the purposes of this example, but I'm not sure your home directory is
really the best place. Anywhere you'd like though, doesn't matter to
me.</para>

<para xml:id="p4">Fire up your favorite web browser and connect to the admin interface
on port 8001
(<link xlink:href="http://localhost:8001/"/>, probably); you'll need to login
with whatever userid/password combination you selected at installation
time.</para>

<para xml:id="p5">Once you're there, click “Forests” in the “Configure” tree
control in the left hand column and then select the “Create” tab.
Enter any name you'd like for the forest and click “ok”. I named mine
“mbb”.</para>

<mediaobject role="flickr">
    <!--Create a forest-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3865078512/">
    <imagedata fileref="http://farm4.static.flickr.com/3186/3865078512_ee28a5cdb4.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p6">Forests are where the server stores XML documents. Trees, as it were.
Clever, eh?</para>

<para xml:id="p7">Next, choose “Databases” in the tree control and select the “Create”
tab again. Enter any name you'd like for the database and click “ok”. I named
mine “mbb”.  I can't think of a compelling reason to give them different names,
but suit yourself.</para>

<mediaobject role="flickr">
    <!--Create a database-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3864296197/">
    <imagedata fileref="http://farm3.static.flickr.com/2666/3864296197_8b9abcd82d.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p8">Once you've created a database, you'll be reminded that you need to attach
a forest to the database.</para>

<mediaobject role="flickr">
    <!--You must attach a forest to the database-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3865078752/">
    <imagedata fileref="http://farm3.static.flickr.com/2476/3865078752_c0be6ca29b.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p9">Click on that link and do so. Remember to click “ok”.</para>

<mediaobject role="flickr">
    <!--Attach the forest you created-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3864296397/">
    <imagedata fileref="http://farm3.static.flickr.com/2576/3864296397_fc437a339d.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p10">Almost there. Choose “Groups”, “Default”, and “App Servers” in the tree
control, then select the “Create HTTP” tab. Enter any name you'd like for
the server name, I named mine “mbb”; enter the location where you unpacked the zip
file for the root, I used <filename>/home/ndw/mbb</filename>;
and enter an open port value for the port, I used “8330”.</para>

<para xml:id="p11">But <emphasis>don't</emphasis> click “ok” just yet. (If you already did, 
no worries, 
just click on the app server's name in the tree control.)</para>

<mediaobject role="flickr">
    <!--Create an HTTP application server-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3865078948/">
    <imagedata fileref="http://farm3.static.flickr.com/2533/3865078948_347c5a8d72.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p12">Scroll about half way down the page to change the authentication and
default user. Select “application level” for the authentication scheme and “admin”
for the default user.</para>

<mediaobject role="flickr">
    <!--Change the authentication to application-level-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3865079022/">
    <imagedata fileref="http://farm3.static.flickr.com/2605/3865079022_9250372166.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p13">This gives your application complete access to the server without having
to login. There
are lots of ways to make an application more secure, 
but let's leave all the security knobs for another day. Now scroll to the
top or bottom and click “ok”.</para>

<para xml:id="p14">At this point, you have a real honest-to-goodness application running
on your server. (And yeah, this should all be simpler and easier. I've heard
tell of plans to improve it, but nothing I can swear to.)</para>

<para xml:id="p15">I included a copy of “CQ”, a browser-based, interactive
XQuery environment in the distribution.
You can see it if you navigate your browser to
<link xlink:href="http://localhost:8330/cq"/>. (In this and all the following
examples, if you chose a different
port, use the port number you chose.)</para>

<para xml:id="p16">If you click on the “explore” link at the top of the CQ page, you'll
see that you've got an empty database.</para>

<mediaobject role="flickr">
    <!--CQ shows the empty database-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3865079102/">
    <imagedata fileref="http://farm4.static.flickr.com/3503/3865079102_6394379522.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p17">Now it's time to configure this particular database for our
micro-blogging backup application. Later on, we're going to need some
indexes. You could walk through the admin UI to create them, but
that's tedious, you only have to do this once, and the admin UI is
completely scriptable, so I created a little query to do the grunt
work.
</para>

<para xml:id="p18">Point your web browser at the database configuration script: 
<link xlink:href="http://localhost:8330/init/setup-database.xqy"/>.
If everything is setup correctly, you'll quickly get a “database configured”
message.</para>

<mediaobject role="flickr">
    <!--Configure the database-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3865079144/">
    <imagedata fileref="http://farm3.static.flickr.com/2467/3865079144_9dabcc9ed1.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p19">Next, we need to configure the microblogging accounts that you want to
backup. Like database configuration, you're probably only going to do this once
(or at least once in a great while), so I didn't create any sort of UI for it.
</para>

<para xml:id="p20">In the directory where you unpacked <filename>mbb02.zip</filename>, open
up <filename>init/setup-accounts.xqy</filename> with your favorite text
editor. On lines 57 and 58 replace <literal>SCREEN_NAME</literal> and
<literal>PASSWORD</literal> with the <link xlink:href="http://twitter.com/">Twitter</link> username and password
that you want to backup.</para>

<para xml:id="p21">If you're using <link xlink:href="http://identi.ca/">Identi.ca</link>
instead, you'll have to do a little more editing, but it should be pretty
straightfoward. If you're using your own install of the Laconica software,
or you're using some other microblogging server, as long as it supports the
Twitter API, you should be able to figure out what to do. Feel free to ask
if you're not sure.</para>

<para xml:id="p22">When you've got all your accounts in place, save the file and point your
web browser at it: 
<link xlink:href="http://localhost:8330/init/setup-accounts.xqy"/>.
</para>

<mediaobject role="flickr">
    <!--Configure your accounts-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3865079190/">
    <imagedata fileref="http://farm4.static.flickr.com/3493/3865079190_077522b8dc.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p23">If all goes well, you'll get an appropriate “Accounts
initialized” message. If you get 500 errors, you messed up the XQuery syntax
somewhere. It won't do any harm to run the setup account script more than once,
so try making small changes, running the script after each change.
If you get stuck, let me know.
</para>

<para xml:id="p24">If you go back to CQ again and click the “explore” link, you'll see that
there are documents in the database now, one for each account you added.</para>

<mediaobject role="flickr">
    <!--CQ shows the database with one document-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3864296927/">
    <imagedata fileref="http://farm4.static.flickr.com/3452/3864296927_3f17eaca53.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p25">Now we're ready to <emphasis>really</emphasis> do something.</para>

<para xml:id="p26">Point your web browser at
<link xlink:href="http://localhost:8330/get-tweets.xqy"/> to download your
status messages. This may take a while, especially the first time and especially
if you entered several accounts.</para>

<mediaobject role="flickr">
    <!--Download the status messages for your account(s)-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3864297111/">
    <imagedata fileref="http://farm4.static.flickr.com/3239/3864297111_18b9b23031.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p27">If you get a message about “rate limit exceeded”, it means you've done too
many interactions with the Twitter API this hour. Wait a bit and try again.
Twitter threatens that they'll turn off your account if you flagrantly violate
the rate limit, so the MBB queries are pretty careful not to.
</para>

<para xml:id="p28">The “explore” link in CQ will now show a whole bunch of documents in 
the database.</para>

<mediaobject role="flickr">
    <!--CQ shows a database full of documents-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3864297285/">
    <imagedata fileref="http://farm3.static.flickr.com/2643/3864297285_3d60b8eba4.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p29">You can enter any arbitrary XQuery expressions you'd like into CQ.
Here I've asked for a count of all the messages that I've “favorited”.</para>

<mediaobject role="flickr">
    <!--Arbitrary XQuery expressions evaluated by CQ-->
  <imageobject xlink:href="http://www.flickr.com/photos/ndw/3864297405/">
    <imagedata fileref="http://farm4.static.flickr.com/3251/3864297405_f7c7e9ea74.jpg"/>
  </imageobject>
</mediaobject>

<para xml:id="p30">In the next parts, we'll look at some of the code behind this
functionality in a little more detail, add some XQuery to display the
messages, look at how we can augment the messages in useful ways, add
searching, and finally pull the pieces together into a useful little
app. Well, a little app I think is useful, anyway.</para>

<section xml:id="old">
<title>What about my older messages?</title>

<para xml:id="p31">Twitter only lets you get at the last 3,200 or so status
messages with the Twitter API. If you've got older status messages that you've
already backed up, or if you can find some other API to get at them, there
are other ways to get them in the database.</para>

<para xml:id="p32">I left the skeleton of one of those ways in the <filename>init</filename>
directory, an XProc pipeline that <link xlink:href="http://xmlcalabash.com/">XML
Calabash</link> can run to load status messages from existing XML files.
</para>

<para xml:id="p33">If you've got your old tweets archived in XML, drop me a line and I'll
try to point you in the right direction.</para>
</section>
</essay>

