Content Negotiation

Volume 6, Issue 50; 02 Jul 2003

Content negotiation is a strategy for dealing with multiple representations of the same resource. It can cause some pretty subtle failures. Is it really worth it?

Content negotiation is a strategy for dealing with multiple representations of the same resource.

The canonical example of why I might want to use content negotiation goes something like this: suppose I have an SVG diagram that I want to publish. Ideally, I could just publish the SVG diagram, but SVG isn't supported by every browser out there so I might want to make other representations available too. I could publish a JPEG image as well, for example.

Now, if your browser understands SVG, I want to send you the SVG. If it doesn't understand SVG but it does understand JPEG, I want to send the JPEG. Similarly, I could fallback from JPEG to something else. (Fallback isn't the only use for content negotiation, as we'll see in a minute.)

To achieve this, the browser and the server “negotiate” with headers. Your browser sends a list of content types that it understands and the server consults the list of representation types it has and sends back the “best” match.

For example, my browser du jour sends the following accept header:

Accept: text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,
image/jpeg,image/gif;q=0.2,*/*;q=0.1

That means I'd get the JPEG image since I don't accept SVG (image/svg+xml). The “ */* ” on the end says that I'll accept anything you've got if you don't have something I've listed explicitly. The “q” value attempts to make that a low priority option.

Content negotiation is clearly useful, but it's not without its problems. One well known problem concerns fragement identifiers. Fragment identifiers are strictly a client-side issue, so they're oblivious to content negotiation.

If I serve up several representations of a resource, I better make sure that either fragment identifers aren't used or that all of the representations have a common fragment identifier syntax. If #fragid points to a $100 credit in one representation and a $100 debit in another, that's a problem. It might even be perceived as fraud.

Content Negotiation on this Site

This site uses content negotation to serve a variety of representations. For example, there are four representations of this document: HTML, XML, PDF, and RDF. There's no obvious fallback relationship here, they're just different representations.

One reader reported some problems this morning that I think trace back to one, possibly two, bugs in Internet Explorer, but the situation is not altogether obvious. It took several minutes, and the kind assistance of a number of people on the #foaf IRC channel to work it out. (And beyond kind assistance, I'm grateful for the patience of the assembled masses for my completely off-topic thread on that channel.)

The first bug stems from Explorer's use of “ */* ” as its default accept header. I really think the client ought to list the types it knows about explicitly. The problem arose in part because this reader had installed some plugin to read PDF files. Installing the plugin had updated the accept header to include application/pdf. So (ignoring some irrelevant MIME types) now the browser claimed:

Accept: application/pdf, */*

From my server's point of view, this makes PDF the “best match”. So every attempt to get a URI from this site returned a PDF file instead of an HTML file.

That's bad enough. But a second bug made the situation even worse. Explorer discovers a PDF file coming down the wire at it and hands the content off to the plugin. Only it doesn't hand the actual bits to the plugin, instead it hands the URI to the plugin. The plugin turns around and requests the content itself. Only it uses a different set of headers. Instead of telling the server that it only understands PDF, it says something else (I don't know what). My server decides that HTML is the best match for this second request and hands back an HTML document, to which the plugin replies, “What the heck? This isn't a PDF file.”

Architecturally Dubious?

At this point, we're about eleven levels farther down in the web architecture than any mortal should have to tread. On the one hand, content negotiation offers a transparent solution to a tricky problem. On the other hand, the very transparency of such solutions makes them devilishly hard to understand when they stop working.

Content negotiation can cause some pretty subtle failures. Is it really worth it? Quite possibly. But if it starts getting used more widely, programmers and web designers are going to have to think hard about its implications.

Comments

"Content negotiation can cause some pretty subtle failures."

I disagree. B.A.D. clients can cause some pretty awful problems when they abuse the protocols and RFCs. It's not content negotiation that is to blame here.

"Is it really worth it?"

When everyone follows the rules, I think it's more than worth it.

I think it's time to reconsider content negotiation, see my blog post.

[Transparent] content negotiation is just yet another feature that the guys over at Microsoft has managed to break for everyone.

Is this problem of content negotiation now solved with the canonical tag?