WITW: NSDL

Volume 8, Issue 40; 12 Mar 2005; last modified 08 Oct 2010

Norm's Service Description Language (staggeringly original name, I know) is my experiment with a simpler web services description language.

Time makes more converts than reason.

—Thomas Paine

Back when WSDL defeated me, I realized even in my defeat that some sort of description language was necessary. It must be possible to describe services so that compilers can build interfaces to them, that's the only way to make them accessible to “ordinary programmers” who don't care about web services for web services sake.

There seem to be two main requirements:

Make it possible for ordinary programmers to use web services as transparently as they use other code libraries.
Make it possible for ordinary service providers to describe their interfaces in a standard way so that some level of interoperability can be achieved.

A little web searching will reveal that I'm not the first to have this idea. And there may be existing “off the shelf” solutions that already satisfy those requirements. But Where in the World isn't about the getting done, it's about the doing. To that end, I decided to see if I could tackle the problem, if I could not only describe a solution, but build it too.

Going back to my roots, I decided that I'd attempt to describe services that are directly accessible via GET or POST over HTTP. That means no fancy binding specifications, abstract port descriptions, arbitrary intermediaries, or who knows what else. I've got no hope of getting all that stuff right before someone can explain why it's actually needed anyway. (I won't attempt to dispute with any authority that it is needed, but I don't need it and I don't understand it.)

Sketching a service description

Although, in the modern style, Perl and Python functions often take named parameters, I think positional parameters are still the most natural to most programmers. For the HTTP GET case then, I think this reduces the problem to one of mapping positional parameters on a function invocation to named parameters on an HTTP URI.

The programmer's use of user('ndw') has to be translated to an HTTP GET of http://norman.walsh.name/2005/02/witw/is?userid=ndw and then some part of the result has to be returned as a scalar value.

Here's how I describe that in NSDL. First, the service:


<service name="user"
	 action="get"
	 uri="http://norman.walsh.name/2005/02/witw/is?">

The service defines a method named user, is invoked with an HTTP GET, and has the URI specified. Next, the parameters that this service can have must be identified:

  <request>
    <parameter name="userid" type="xsd:string"/>
    <parameter name="nearby?" type="xsd:integer" default='0' optional="yes"/>
  </request>

The positional parameters in the method invocation get mapped to the list of parameters in the request block. In this case, the first parameter is the value of userid. The second, optional, parameter is the value of nearby. If it isn't specified, it will default to 0.

Finally, something has to be returned. That's identified in the response:

  <response>
    <result select="/is:is/is:user/is:name"/>
  </response>

If all goes well, the value returned by this method will be the value of that XPath expression given in the select attribute as applied to the document returned by the service.

But what if something goes wrong? What if the service doesn't return the expected value? The response can be augmented to look for errors:

  <response>
    <fault name="baduserid" select="//is:unknown-user"/>
    <fault name="invalid" select="//is:invalid-request"/>
    <result select="/is:is/is:user/is:name"/>
  </response>

Now the service will “fault” with a “baduserid” or “invalid” code if either of those XPath expressions matches the result. (Fault handling isn't the strongest suit of my implementation, I admit.)

Parameter typing

If you're observant and have a good memory, you may have noticed two things about parameter types: first, that they're defined using W3C XML Schema data types and second, that the type of nearby is wrong. The lexical space of nearby should be limited to exactly “0” or “1”.

With respect to the first observation, you're absolutely right. But I'm actually accomplishing this with RELAX NG. Partly, I admit, out of a desire to prove that RELAX NG is as reasonable a validation technology for web services as any other. But also partly because libxml provides a RELAX NG validator.

You're absolutely right about the second observation, too, but that can be fixed now. First, add a new section to the service description file that defines the additional typeYes, “type” is a misnomer. It'd more properly be called a “pattern” in RELAX NG parlance. Humor me, ok?:

<types xmlns:rng="http://relaxng.org/ns/structure/1.0">
  <rng:define name="DigitBoolean">
    <rng:choice>
      <rng:value>0</rng:value>
      <rng:value>1</rng:value>
    </rng:choice>
  </rng:define>
</types>

Then change the type of the request parameter:

  <request>
    <parameter name="userid" type="xsd:string"/>
    <parameter name="nearby" type="DigitBoolean" default='0' optional="yes"/>
  </request>

Now the values are properly constrained. This is probably a good place to note that I could have added type checking to the results as well. It'd be pretty straight-forward to add a type attribute and check the results using the same technique I'm using to check the parameters, but I didn't bother. I wouldn't learn anything new from the exercise.

Multiple results

Sometimes it's convenient for a single web service invocation to return multiple results. The same GET that will return the user name from WITW also returns the latitude, longitude, date, mailbox, and a host of other information. Rather than requiring that the service provider decompose the service into individual methods, a service can return multiple results:

  <response>
    <fault name="baduserid" select="//is:unknown-user"/>
    <fault name="invalid" select="//is:invalid-request"/>

    <result name="name" select="/is:is/is:user/is:name"/>
    <result name="userid" select="/is:is/is:user/@userid"/>
    <result name="uri" select="/is:is/is:user/is:uri"/>
    <result xmlns:foaf="http://xmlns.com/foaf/0.1/"
	    name="mailbox" select="/is:is/is:user/foaf:mbox_sha1sum"/>
    <result name="lat" select="/is:is/is:locations/is:location/@lat"/>
    <result name="long" select="/is:is/is:locations/is:location/@long"/>
    <result name="date" select="/is:is/is:locations/is:location/@date"/>
  </response>

That doesn't actually tell the implementation how to provide access to those results, but that's going to have to vary on a per-implementation-language basis anyay. For my implementation, I'm going to return a “response object” that will have access methods for those named results.

Speaking of multiple results, what should we do about XPath expressions that select multiple nodes? Suppose, for example, that we wanted to return all the landmarks?

I thought about this and decided to punt a bit. First, it seems to me that even though we're hiding the web services aspect of this library, we don't need to make it impossible to access. So if you need to get the XML, to extract complex results, that should be possible. Then for multiple nodes, I decided that the easiest thing to do was return an array of results, with each result being the string value of the selected node. It's not perfect, but it'll do for now. For dynamic languages like Perl, anyway, for statically typed languages, I think a different approach would be required.

What about POST?

So far, all the examples use GET, which just uses URL-encoded parameters. What about supporting POST, were there will need to be some sort of message body? To do that, I added a body element to the request. Here's the request block for the “where am I now” service that use POST to update my position:

  <request>
    <parameter name="lat" type="Latitude"/>
    <parameter name="long" type="Longitude"/>

    <body>
      <location xmlns="http://nwalsh.com/xmlns/witw-post#">
	<latlong>
	  <lat>{$lat}</lat>
	  <long>{$long}</long>
	</latlong>
      </location>
    </body>
  </request>

As you can probably guess, the contents of the body is sent in the POST, subject to an XSLT- or ant-style “value template” expansion.

A complete RELAX NG Grammar for NSDL is available.

Show me the code

Service description, parameters, results, XML, blah, blah, blah. Show me the code! Fair enough. My implementation is in Perl and consists of three modules, NSDL::Request, NSDL::Response, and NSDL::UA (for authentication).

Here's a program that uses the service description outlined above to print the name of any user from WITW:

#!/usr/bin/perl -w -- # -*- Perl -*-

use NSDL::Request;

my $userid = shift @ARGV
    || die "Usage: $0 userid\n";

my $req = new NSDL::Request();
$req->load('witw.nsd');

my $res = $req->user($userid);
print "$userid is $res\n";

I think that satisfies the first requirement. With a little code generation, I could simplify it further, removing the call to “load” and making a class specifically for the WITW services, but I'm not going to bother.

Taking advantage of the service description that returns multiple results, it can be written this way:

#!/usr/bin/perl -w -- # -*- Perl -*-

use NSDL::Request;

my $userid = shift @ARGV
    || die "Usage: $0 userid\n";

my $req = new NSDL::Request();
$req->load('witw.nsd');

my $res = $req->user($userid);
print "$userid is ", $res->name();
print " (", $res->mailbox(), ").\n";
print "Last seen on ", $res->date(), "\n";
print "at (";
print $res->lat(), ", ", $res->long();
print ")\n";

Which produces results like this:

ndw is Norman Walsh (9f5c771a25733700b2f96af4f8e6f35c9b0ad327).
Last seen on 2005-03-09T14:23:41Z
at (42.3382, -72.4500)

Updating my location is just as easy:

#!/usr/bin/perl -w -- # -*- Perl -*-

use NSDL::Request;

my $userid = shift @ARGV;
my $passwd = shift @ARGV;
my $lat = shift @ARGV;
my $long = shift @ARGV;

my $req = new NSDL::Request();
$req->load('witw.nsd');

$req->auth($userid, $passwd);
my $res = $req->ami($lat, $long);

Though in this case I have to provide authentication information so that the POST will succeed (and I haven't bothered with any error checking).

Implementation Details

In the course of building the implementation, I've tried to make it as self-contained and portable as possible. I found that the Perl interfaces to libxml, specifically XML::LibXML and XML::LibXML::XPathContext provided almost everything I needed. The only other external dependencies are to LWP::UserAgent for HTTP support and IO::Scalar for some lazy string construction with print statements.

As an aside, I'm particularly impressed with the XML::LibXML family of packages. They're likely to become my new standards for working with XML in Perl. You get DOM, RELAX NG validation, and XPath support all in one. Nice work!

One More Example

Yeah, yeah, all well and good, you can write simple programs to access a toy web service. What about the real world? Ok, how about using NSDL to access Amazon?

With an appropriate description, we can write a short program to access Amazon books by author:

#!/usr/bin/perl -w -- # -*- Perl -*-

use NSDL::Request;

my $usage = "$0 amazonid author\n";

my $amazonid = shift @ARGV || die $usage;
my $author = shift @ARGV || die $usage;

my $req = new NSDL::Request();
$req->load('amazon.nsd');

my $res = $req->booksbyauthor($amazonid, $author);

printf "Amazon query returned %d results in %1.2fs:\n",
    $res->count(), $res->time();

my $titles = $res->titles();
if (ref $titles) {
    my $count = 1;
    foreach my $title (@{$titles}) {
	print "\t$count. $title\n";
	$count++;
    }
} else {
    print "\t$titles\n";
}

If you ask for books by Norman Walsh today, you get:

Amazon query returned 5 results in 0.07s:
        1. DocBook: The Definitive Guide (O'Reilly XML)
        2. Forensic Nursing and Mental Disorder in Clinical Practice
        3. Agent-Mediated Electronic Commerce IV. Designing Mechanisms and Systems : AAMAS 2002 Workshop on Agent Mediated Electronic Commerce, Bologna, Italy, J ... e / Lecture Notes in Artificial Intelligence)
        4. Docbook la reference
        5. Making TeX Work (A Nutshell Handbook)

There. (And three out of five ain't bad, I don't think.) I'm not going to think to hard about the fact that this search turns up DocBook, electronic commerce, and mental disorder.

I've satisfied my own curiosity about a simpler web services description language. And the implementation, though definitely no more robust than a “proof of concept” wasn't that hard to cook up. Pointers to where I've gone totally off the rails are most welcome.

Comments

Nice work Norm; I was posting a comment that grew too long so I moved it to http://www.parand.com/say/?p=13 . Short version: how about specifying the template-able parts of the POST input as XPath, so you have some nice consistency, and I'd argue for removing type information, although I'm probably alone in thinking that.