<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       version="pto">
<info>
<title>Content Negotiation</title>
<volumenum>6</volumenum>
<issuenum>50</issuenum>
<pubdate>2003-07-02</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2003</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Content negotiation is a strategy for dealing with multiple
representations of the same resource. It can cause some pretty subtle
failures. Is it really worth it?
</para>
</abstract>
</info>

<para xml:id='p1'>Content negotiation is a strategy for dealing with multiple
representations of the same resource.</para>

<para xml:id='p2'>The canonical example of why I might want to use content
negotiation goes something like this: suppose I have an SVG diagram
that I want to publish. Ideally, I could just publish the SVG diagram,
but SVG isn't supported by every browser out there so I might want to
make other representations available too. I could publish a JPEG image
as well, for example.</para>

<para xml:id='p3'>Now, if your browser understands SVG, I want to send you the
SVG. If it doesn't understand SVG but it does understand JPEG, I want
to send the JPEG. Similarly, I could fallback from JPEG to something
else. (Fallback isn't the only use for content negotiation, as we'll
see in a minute.)</para>

<para xml:id='p4'>To achieve this, the browser and the server <quote>negotiate</quote>
with headers. Your browser sends a list of content types that it understands
and the server consults the list of representation types it has and sends
back the <quote>best</quote> match.</para>

<para xml:id='p5'>For example, my browser du jour sends the following accept header:</para>

<screen>Accept: text/xml,application/xml,application/xhtml+xml,
text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,
image/jpeg,image/gif;q=0.2,*/*;q=0.1</screen>

<para xml:id='p6'>That means I'd get the JPEG image since I don't accept SVG
(<literal>image/svg+xml</literal>). The
<quote><literal>*/*</literal></quote> on the end says that I'll accept
anything you've got if you don't have something I've listed
explicitly. The <quote>q</quote> value attempts to make that a low
priority option.</para>

<para xml:id='p7'>Content negotiation is clearly useful, bit its not without its problems.
One <link xlink:href="http://www.w3.org/TR/webarch/#frag-conneg">well known</link>
problem concerns fragement identifiers. Fragment identifiers are strictly
a client-side issue, so they're oblivious to content negotiation.</para>

<para xml:id='p8'>If I serve up several representations of a resource, I better
make sure that either fragment identifers aren't used or that all of
the representations have a common fragment identifier syntax. If
<literal>#fragid</literal> points to a $100 credit in one
representation and a $100 debit in another, that's a problem.
It might even be perceived as fraud.</para>

<section xml:id='s1'>
<title>Content Negotiation on this Site</title>

<para xml:id='p9'>This site uses content negotation to serve a variety of
representations. For example, there are four representations of this
document: HTML, XML, PDF, and RDF. There's no obvious fallback relationship here,
they're just different representations.</para>

<para xml:id='p10'>One reader reported some problems this morning that I think
trace back to one, possibly two, bugs in <application>Internet
Explorer</application>, but the situation is not altogether obvious.
It took several minutes, and the kind assistance of a number of people
on the <link xlink:href="irc:irc.freenode.net#foaf">#foaf</link> IRC
channel to work it out. (And beyond kind assistance, I'm grateful for
the patience of the assembled masses for my completely off-topic
thread on that channel.)</para>

<para xml:id='p11'>The first bug stems from Explorer's use of <quote><literal>*/*</literal></quote>
as its default accept header. I really think the client ought to list the 
types it knows about explicitly. The problem arose in part because this reader had
installed some plugin to read PDF files. Installing the plugin had updated
the accept header to include <literal>application/pdf</literal>. So (ignoring
some irrelevant MIME types) now
the browser claimed:</para>

<screen>Accept: application/pdf, */*</screen>

<para xml:id='p12'>From my server's point of view, this makes PDF the <quote>best match</quote>.
So every attempt to get a URI from this site returned a PDF file instead of
an HTML file.</para>

<para xml:id='p13'>That's bad enough. But a second bug made the situation even
worse. Explorer discovers a PDF file coming down the wire at it and
hands the content off to the plugin. Only it doesn't hand the actual
bits to the plugin, instead it hands the URI to the plugin. The plugin
turns around and requests the content itself. Only it uses a different
set of headers. Instead of telling the server that it only understands PDF,
it says something else (I don't know what). My server decides that HTML
is the best match for this second request and hands back an HTML document,
to which the plugin replies, <quote>What the heck? This isn't a PDF file.</quote>
</para>

</section>

<section xml:id='s2'>
<title>Architecturally Dubious?</title>

<para xml:id='p14'>At this point, we're about eleven levels farther down in the web
architecture than any mortal should have to tread. On the one hand,
content negotiation offers a transparent solution to a tricky problem.
On the other hand, the very transparency of such solutions makes them
devilishly hard to understand when they stop working.</para>

<para xml:id='p15'>Content negotiation can cause some pretty subtle failures. Is it
really worth it? Quite possibly. But if it starts getting used more
widely, programmers and web designers are going to have to think hard
about its implications.</para>
</section>

</essay>
