Using XQuery in anger

Volume 11, Issue 49; 02 Jul 2008; last modified 08 Oct 2010

As XQuery, in the form of my first real work project, kept me busy over the past month or so, it seems logical to start blogging again with some lessons learned.

What I said before still applies: I remain impressed with XQuery as a programming language for building web applications on top of a product like Mark Logic Server.

I have a slightly odd relationship with XQuery because I was on the XSL Working Group when XPath 2.0, XSLT 2.0, and XQuery 1.0 were being defined. A lot of that work was done jointly, so I've seen a fair bit of XQuery go by, but I wasn't paying that much attention to the XQuery-only bits [Clearly! See below. -ed].

My first real work project was a proof-of-concept exercise. In broad strokes, I had to take collections of items from five or six different XML formats, present them reasonably well in several formats, provide a uniform search interface (full-text and by discrete facets) across all the formats, allow the user to select items and build a new document from those items, and provide some simple forms for adding new items or changing existing items.

It's a few thousand lines of XQuery (and some JavaScript libraries, but that's another topic) and it works just they way I want it to. The requirements have shifted a little bit once or twice and XQuery proved quite agile.

Which isn't to say without annoyance.

First, there's the issue that

<foaf:nick>Micah</foaf:nick>

identified. Careless use of path expressions littered some of my HTML with random source elements. For most of them, I found a cast to xs:string sufficient. For others, as

<foaf:nick>Jeni</foaf:nick>

points out in her comment, you really do need to do the recursive function thing.

And, well, there's that whole recursive-descent, walk the tree yourself, approach to converting from one XML vocabulary to another. That's just a drag. If XSLT has it and XQuery doesn't, XQuery is the weaker for it. Formatting dates and times, generating unique identifiers, grouping, who thought these were things you didn't need in XQuery?

If you're familiar with variables in XSLT, it's tempting to think of the XQuery let statement as roughly analogous. So you write things like this:


let $foo := some expression
something[@class=$foo]|something-else[contains(@role,$foo)]

And you discover that you've left out the return statement. Careful examination of the XQuery grammar will reveal that let is only allowed in a FLWOR expression, it's just that the for, where, and order by parts are optional. The return isn't. I kid you not.

Speaking of the return statement, am I the only one routinely tempted to put it at the end of function bodies, where it doesn't belong? Yeah, maybe I am.

Finally, there is a small corner of the XQuery specification which is deeply, fundamentally perverse. XQuery allows you to specify the default element namespace so that the bare element name “somename” in your query will be in the specified namespace. Unfortunately, doing that makes it impossible to write the name of an element that is not in a namespace. So the XQuery specification allows you to bind a prefix to the empty string:

declare namespace l = ""

You can then refer to elements in no namespace using that prefix, “l:someothername”. That is just wrong, wrong, wrong!

I concede that within the context of XQuery, it works and is entirely consistent, but it still makes me want to barf on my shoes. The problem is that the element name “<l:someothername/>” in an actual XML document can never be in no namespace. So the XQuery syntax in this regard seems like a really bad compromise to my eyes.

None of these problems can be fixed at this stage, but none of them are show-stoppers. They're just idiosyncratic. I know Perl, I can deal with idiosyncracy.

Comments

Besides casting to xs:string, there's also good old fn:string(), which has some recursive benefit. -m

So what is it good at Norm? You've listed the odd bits. Does it have any real use, better than xslt 2 areas?

The problem is that the element name “<l:someothername/>” in an actual XML document can never be in no namespace.

Is it possible that, in this instance, it's XML that's broken, not XQuery?

Having a default namespace for the bulk of your markup but being able to insert a few no-namespace elements doesn't seem completely ridiculous to me though, of course, you might want to ask why the inserted elements aren't in a namespace. A possible use case would be an XSLT stylesheet which generates no-namespace documents where you could avoid all the xsl: prefixes.

My main difficulty with XQuery, which I think is reflected a bit in your posting, is that it's a few different syntaxes combined, and if I haven't coded in it in a few months I get confused about which bits are supposed to go where--for example, when to use XML comments and when to use the adorable smiley face comments. (Idiosyncratic as perl is, I don't have that problem there.) But remember, XQuery wasn't designed to appeal to XML people; it's the XML manipulation program for people who don't like XML, with its curly braces and semicolons (just like a real programming language!) and the "SQL-like" keywords.

I don't hate XQuery, like I hated, say, Omnimark syntax, but I find it a bit frustrating at times. The real power is in the implementations: the ability to retrieve subsets of gigabytes of indexed XML purely based on element and attribute conditions, with no need for the storage program to "chunk" the storage into something that it can map to its legacy document storage technology.

Interesting to hear about your encounters with XQuery syntax here and via Twitter. I've just added a few of your gotchas to

http://en.wikibooks.org/wiki/XQuery/Gotchas

in the XQuery Wikibook which you may like to improve.

This XQuery resource is mainly examples of simple applications I'm interested in but I'm working on a new section focused on patterns of XQuery application design. The book is rather eXist focused because that's my environment but I'd like to be able to include some Mark Logic examples, perhaps as comparative implementations of the same problem.