Content negotiation is a strategy for dealing with multiple representations of the same resource. It can cause some pretty subtle failures. Is it really worth it?
Content negotiation is a strategy for dealing with multiple representations of the same resource.
The canonical example of why I might want to use content negotiation goes something like this: suppose I have an SVG diagram that I want to publish. Ideally, I could just publish the SVG diagram, but SVG isn't supported by every browser out there so I might want to make other representations available too. I could publish a JPEG image as well, for example.
Now, if your browser understands SVG, I want to send you the SVG. If it doesn't understand SVG but it does understand JPEG, I want to send the JPEG. Similarly, I could fallback from JPEG to something else. (Fallback isn't the only use for content negotiation, as we'll see in a minute.)
To achieve this, the browser and the server “negotiate” with headers. Your browser sends a list of content types that it understands and the server consults the list of representation types it has and sends back the “best” match.
For example, my browser du jour sends the following accept header:
Accept: text/xml,application/xml,application/xhtml+xml, text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png, image/jpeg,image/gif;q=0.2,*/*;q=0.1
That means I'd get the JPEG image since I don't accept SVG
” on the end says that I'll accept
anything you've got if you don't have something I've listed
explicitly. The “q” value attempts to make that a low
Content negotiation is clearly useful, bit its not without its problems. One well known problem concerns fragement identifiers. Fragment identifiers are strictly a client-side issue, so they're oblivious to content negotiation.
If I serve up several representations of a resource, I better
make sure that either fragment identifers aren't used or that all of
the representations have a common fragment identifier syntax. If
#fragid points to a $100 credit in one
representation and a $100 debit in another, that's a problem.
It might even be perceived as fraud.
Content Negotiation on this Site
This site uses content negotation to serve a variety of representations. For example, there are four representations of this document: HTML, XML, PDF, and RDF. There's no obvious fallback relationship here, they're just different representations.
One reader reported some problems this morning that I think trace back to one, possibly two, bugs in Internet Explorer, but the situation is not altogether obvious. It took several minutes, and the kind assistance of a number of people on the #foaf IRC channel to work it out. (And beyond kind assistance, I'm grateful for the patience of the assembled masses for my completely off-topic thread on that channel.)
The first bug stems from Explorer's use of “
as its default accept header. I really think the client ought to list the
types it knows about explicitly. The problem arose in part because this reader had
installed some plugin to read PDF files. Installing the plugin had updated
the accept header to include
application/pdf. So (ignoring
some irrelevant MIME types) now
the browser claimed:
Accept: application/pdf, */*
From my server's point of view, this makes PDF the “best match”. So every attempt to get a URI from this site returned a PDF file instead of an HTML file.
That's bad enough. But a second bug made the situation even worse. Explorer discovers a PDF file coming down the wire at it and hands the content off to the plugin. Only it doesn't hand the actual bits to the plugin, instead it hands the URI to the plugin. The plugin turns around and requests the content itself. Only it uses a different set of headers. Instead of telling the server that it only understands PDF, it says something else (I don't know what). My server decides that HTML is the best match for this second request and hands back an HTML document, to which the plugin replies, “What the heck? This isn't a PDF file.”
At this point, we're about eleven levels farther down in the web architecture than any mortal should have to tread. On the one hand, content negotiation offers a transparent solution to a tricky problem. On the other hand, the very transparency of such solutions makes them devilishly hard to understand when they stop working.
Content negotiation can cause some pretty subtle failures. Is it really worth it? Quite possibly. But if it starts getting used more widely, programmers and web designers are going to have to think hard about its implications.