Implementing AtomPub

Volume 12, Issue 3; 23 Jan 2009; last modified 13 Dec 2011

A few weeks ago, I decided to build a conformant AtomPub server implementation on MarkLogic Server. Mostly for fun, but partly with an eye towards using it for some future reimplementation of this weblog. In any event, it's up and running on my test server.

Implementing AtomPub in MarkLogic Server was a fun little project. The first 90% of the exercise took about two days, the remaining 10% took about a week and a half. Such is the way of fun little projects.

The executive summary: dead easy to implement in MarkLogic Server. I built a flexible, conformant AtomPub server in less than a thousand lines of XQuery. When I get a chance, I'll write up some documentation for it and put it on the Mark Logic Developer Network.

The only tricky part, really, was getting the security right. But when isn't it tricky to get security right?

It's very convenient in a lot of applications to rely on “application level” securityNote that I said “convenient”. I didn't say “wise” or “best”.. You give all your XQuery code full privileges to the whole system and rely on your coding skills to manage access. This is very flexible and convenient, but it doesn't work for AtomPub.

AtomPub clients expect to use HTTP authentication to gain access to the server, so that's what you have to provide. Unlike a human user on a web browser, where you might implement a floating, “web 2.0” style login box (or its accessible equivalent), for a machine operating over a wire protocol, you have to reply with and respond to the proper HTTP authentication challenges.

Generally speaking, what this means is that you have to provide two URIs for each resource on the server: one URI provides read-only, public access, the other provides authenticated read-write access.

If you're developing on an Apache server (and I assume the same is true for a lot of other servers), it's often convenient to do this by hacking the path component and using .htaccess files. So, for example, http://example.com/path/to/entry is available to anyone, and http://example.com/edit/path/to/entry is the same entry protected by authentication.

In the context of MarkLogic Server, the most straightforward way to do this is with two application servers running against the same database. You can see this in my test environment. The server at http://microwave.homedns.org:8600/ requires no authentication but also has no priviliges to edit any files on the server. The server at http://microwave.homedns.org:8601/ requires authentication and users who successfully authenticate have priviliges to edit their documents.

How does it work?

There are basically six modules plus a little ancillary code. An incoming request is caught by the error-handler.xqy module and dispatched appropriately by calling functions in the atompub.xqy module. Support for HTTP PUT and POST are handled by separate modules (with the uninspired names put.xqy and post.xqy). This allows the actual code run for PUT and POST to be configured on a per-feed basis, because flexibility is a good thing. Each of these modules calls validate.xqy to determine if the incoming content is acceptable. Again, this is a separate module for flexibility. A format.xqy module is invoked when a GET is made against the “alternate” link of an entry.

Out of the box, validation and formatting are designed to work with plain text or (X)HTML entries. One of my longer-term goals is to reimplement this weblog on top of MarkLogic Server. When I do that, I'll customize my weblog to validate and format the DocBook extension that I use for authoring.

Security wise, there's a joepublic user with the weblog-reader role. That's the default user on the server on port 8600. The weblog-reader role grants just enough priviliges to run the AtomPub code.

Each user that's created has three roles: weblog-reader, weblog-editor, and weblog-editor-username. The weblog-editor role identifies the user as an editor while the weblog-editor-username role gives them the URI privilige necessary to write to their part of the database. (This is what prevents you from logging in as an editor and then writing entries in my part of the database.)

User Administration

The final detail detail, and honestly the last 10% that took took the other 90% of the time, is the “admin” interface, such as it is. You can create an account on the server by filling out the form on the homepage and following the link that will be emailed to you.

If you've really been paying attention, you'll note that these admin tasks run on port 8600, which I earlier said had only read-only access to the server. So how can it create new accounts?

The answer is that the server's security API is sufficiently powerful that I can amplify the priviliges of an individual function. These “amped” functions allow the application author to provide additional priviliges in a very localized fashion. So there are two functions (request-user and create-user) that can edit the configuration file even when run by ordinary mortals (specifically joepublic). (And a special thanks to Danny Sokolsky our Technical Documentation Manager for guiding me through an embarrassingly long series of bone-headed attempts on my part to get this working correctly.)

If a user doesn't have a service.xml document (and none of the users do since I haven't provided an admin API for creating or editing one), they get a default one. The default one has two collections, one ordinary collection and one media collection.

Give it a whirl!

My implementation passes Joe Gregorio’s APP Test Client and Tim Bray’s Atom Protocol Exerciser so I think it's ready for real world use.

Feel free to give it a try on Microwave. Report any problems that you encounter, naturally. It's quite possible that I've misinterpreted parts of RFC 5023.

Fair warning: Microwave is a spare box in my house. It's bloody noisy, so I keep it in the hallway and I turn it off at night so that it doesn't keep me awake. It's usually online between about 7:00a and 10:00p EST, but I'm not making an long-term promises about it.

Comments

"AtomPub clients expect to use HTTP authentication to gain access to the server, so that's what you have to provide."

Any thoughts on feasibility of plugging in OAuth here?

—Posted by Dan Brickley on 23 Jan 2009 @ 08:57 UTC #

Sure, I think you could plug in OAuth, if you had a client that was prepared to use OAuth. If someone points me to an OAuth-aware client that I can run, I'll try to get OAuth working on the server end.

—Posted by Norman Walsh on 23 Jan 2009 @ 09:02 UTC #

MarkLogic should probably ship with an AtomPub setup out of the box, as this is one of the reasons why people choose eXist over MarkLogic today.

—Posted by Keith Fahlgren on 25 Jan 2009 @ 07:36 UTC #

Hi Norman, in the company we've been working on OAuth Open Source framework implementation, the Consumer (the client) is quite stable, you can find it on

http://code.google.com/p/asmx-oauth

Sorry for the lack of documentation, but you can take a look on the consumer-sample to understand how to use it... APIs have been designed to be easy to use, but I'm really happy to help you on setting up your test environment, let me know!

—Posted by Simone Tripodi on 26 Jan 2009 @ 09:34 UTC #