Of XML documents and media types. Are namespaces sometimes redundant? How much are you willing to infer?
In real life, unlike in Shakespeare, the sweetness of the rose depends upon the name it bears. Things are not only what they are. They are, in very important respects, what they seem to be.
Over lunch on Friday, Dan reintroduced the idea of implicit, or what he called “magic”, namespaces. This is an idea that the TAG discussed some number of years ago, but which hasn't been discussed much recently. I don't recall that it was rejected outright, it just drifted out of focus. I have no idea who orginally thought it up.
If we accept as a principle of web architecture that important things should be identified with URIs, then the following XML document is clearly deficient:
The “title” is presumably important, but it has no URI. Is that an HTML title, a DocBook title, an SVG title, or some other kind of title?
It would be clear, to the extent that its identity constitutes some sort of clarity, if it was in a namespace:
1<title xmlns="http://example.com/ns/">Ford Prefect</title>
(Queue offline discussion of namespaces, namespace documents, the self describing web, the “meaning” of XML documents, etc., etc., etc.)
For documents that use a large and/or arbitrary set of namespaces, explicit namespace declarations are arguably cost effective. Consider the XSLT stylesheet that this site uses to construct HTML essays from DocBook sources; it begins:
1<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 2 xmlns="http://www.w3.org/1999/xhtml" 3 xmlns:atom="http://www.w3.org/2005/Atom" 4 xmlns:c="http://nwalsh.com/rdf/contacts#" 5 xmlns:cvs="http://nwalsh.com/rdf/cvs#" 6 xmlns:daml="http://www.daml.org/2001/03/daml+oil#" 7 xmlns:db="http://docbook.org/ns/docbook" 8 xmlns:dbf="http://docbook.org/xslt/ns/extension" 9 xmlns:dbm="http://docbook.org/xslt/ns/mode" 10 xmlns:dc='http://purl.org/dc/elements/1.1/' 11 xmlns:dcterms="http://purl.org/dc/terms/" 12 xmlns:f="http://nwalsh.com/ns/xslfunctions#" 13 xmlns:foaf="http://xmlns.com/foaf/0.1/" 14 xmlns:gal='http://norman.walsh.name/rdf/gallery#' 15 xmlns:geo='http://www.w3.org/2003/01/geo/wgs84_pos#' 16 xmlns:html="http://www.w3.org/1999/xhtml" 17 xmlns:itin="http://nwalsh.com/rdf/itinerary#" 18 xmlns:m="http://docbook.org/xslt/ns/mode" 19 xmlns:out="http://docbook.org/xslt/ns/output" 20 xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' 21 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" 22 xmlns:skos="http://www.w3.org/2004/02/skos/core#" 23 xmlns:t="http://norman.walsh.name/knows/taxonomy#" 24 xmlns:tmpl="http://docbook.org/xslt/ns/template" 25 xmlns:ttag="http://developers.technorati.com/wiki/RelTag#" 26 xmlns:xlink="http://www.w3.org/1999/xlink" 27 xmlns:xs="http://www.w3.org/2001/XMLSchema" 28 …>
It may not be pretty, but the namespace declaration mechanism gives me uniform access to an ad hoc collection of public and private namespaces. Several of the RDF documents on this site begin with an equally large number of bindings.
But Dan's focus these days isn't on a tag set which requires, or maybe even allows, a large or arbitrary set of namespaces. He's co-chairing one of the HTML working groups. (Poor sod.)
HTML, and equally perhaps other document types that are in basically a single namespace, are an interesting case. Recall that web architecture deals almost exclusively in typed representations. In other words, I don't get just get a bag of bits when I ask for something with HTTP, I get the bits and an internet media type label so I know how to interpret the bits.
Consider what this means for HTML. I ask for a representation and
I get a media type, “
text/html”, and a sequence of bits:
1<html xmlns="http://www.w3.org/1999/xhtml"> 2<head> 3<title>My HTML Document</title> 4</head> 5<body> 6… 7</body> 8</html>
(Queue offline discussion of the right media type for HTML content.)
Now ask yourself, if you know that you were sent HTML, isn't that namespace binding in some sense redundant? If I had accidentally (or intentionally) omitted it, couldn't you have inferred it? After all, nothing else is allowed.
This is the heart of the “implicit namespaces” idea. If the
media type registration for a particular media type identifies a
default namespace for documents of that type, then the parser could
infer a default namespace binding for that namespace. You can even
take this a step further if you're feeling really aggressive. If HTML
includes a normative reference to SVG, then you could infer not only
<html> document element has the HTML namespace by
default, but also that any
<svg> tag in that document has the
SVG namespace by default.
Naturally, these defaults only apply to elements that aren't in
any namespace. They don't interfere with my ability to experiment with
my own Superior Vehicle Grounding (
<svg>) markup, provided
that I give it an explicit namespace.
So, with a little bit of technical cleanup (assuring, for example, that media types have URIs and that they contain a machine readable assertion of the default namespace), we could technically build a compatible, follow-your-nose web architecture that provided implicit namespace bindings for any vocabulary sufficiently popular to have its own media type.
But, man, that's a lot of work. It certainly requires a lot of scaffolding behind the scenes to hold up the facade that some documents that don't have namespace bindings are, in fact, in a namespace.
There's also the nasty case of what to do when the representation loses its associated media type, when it's stored on a local file system, for example, but we already have that problem to a certain extent. This exacerbates the problem though, for sure.
HTML is the most widely deployed markup application ever. Sometimes, I think the folks designing HTML have some really hard constraints imposed upon them by market forces that they have no leverage to control. They simply can't participate in the XML community unless we help them build bridges from that world to ours.
And sometimes I think the folks designing HTML don't care any more about XML now than they cared about SGML before it. They have market forces to serve and those forces never cared about the well-formedness of pages on the web. In that light, it hardly seems cost-effective to add arbitrary complexity to the XML story in order to maintain an illusion that no on really believes.
The truth probably lies somewhere in the middle. And the implicit namespaces story doesn't seem actively harmful, but it might require a little practice to tell with a straight face.