Of XML documents and media types. Are namespaces sometimes redundant? How much are you willing to infer?

In real life, unlike in Shakespeare, the sweetness of the rose depends upon the name it bears. Things are not only what they are. They are, in very important respects, what they seem to be.

Hubert H. Humphrey

Over lunch on Friday, Dan[L] reintroduced the idea of implicit, or what he called “magic”, namespaces. This is an idea that the TAG discussed some number of years ago, but which hasn't been discussed much recently. I don't recall that it was rejected outright, it just drifted out of focus. I have no idea who orginally thought it up.

If we accept as a principle of web architecture that important things should be identified with URIs, then the following XML document is clearly deficient:

  1<title>Ford Prefect</title>

The “title” is presumably important, but it has no URI. Is that an HTML title, a DocBook title, an SVG title, or some other kind of title?

It would be clear, to the extent that its identity constitutes some sort of clarity, if it was in a namespace:

  1<title xmlns="http://example.com/ns/">Ford Prefect</title>

(Queue offline discussion of namespaces, namespace documents, the self describing web, the “meaning” of XML documents, etc., etc., etc.)

For documents that use a large and/or arbitrary set of namespaces, explicit namespace declarations are arguably cost effective. Consider the XSLT stylesheet that this site uses to construct HTML essays from DocBook sources; it begins:

  1<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  2                xmlns="http://www.w3.org/1999/xhtml"
  3                xmlns:atom="http://www.w3.org/2005/Atom"
  4		xmlns:c="http://nwalsh.com/rdf/contacts#"
  5                xmlns:cvs="http://nwalsh.com/rdf/cvs#"
  6                xmlns:daml="http://www.daml.org/2001/03/daml+oil#"
  7                xmlns:db="http://docbook.org/ns/docbook"
  8		xmlns:dbf="http://docbook.org/xslt/ns/extension"
  9		xmlns:dbm="http://docbook.org/xslt/ns/mode"
 10                xmlns:dc='http://purl.org/dc/elements/1.1/'
 11                xmlns:dcterms="http://purl.org/dc/terms/"
 12		xmlns:f="http://nwalsh.com/ns/xslfunctions#"
 13                xmlns:foaf="http://xmlns.com/foaf/0.1/"
 14                xmlns:gal='http://norman.walsh.name/rdf/gallery#'
 15                xmlns:geo='http://www.w3.org/2003/01/geo/wgs84_pos#'
 16                xmlns:html="http://www.w3.org/1999/xhtml"
 17                xmlns:itin="http://nwalsh.com/rdf/itinerary#"
 18                xmlns:m="http://docbook.org/xslt/ns/mode"
 19		xmlns:out="http://docbook.org/xslt/ns/output"
 20                xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
 21                xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
 22                xmlns:skos="http://www.w3.org/2004/02/skos/core#"
 23                xmlns:t="http://norman.walsh.name/knows/taxonomy#"
 24                xmlns:tmpl="http://docbook.org/xslt/ns/template"
 25                xmlns:ttag="http://developers.technorati.com/wiki/RelTag#"
 26                xmlns:xlink="http://www.w3.org/1999/xlink"
 27                xmlns:xs="http://www.w3.org/2001/XMLSchema"
 28                >

It may not be pretty, but the namespace declaration mechanism gives me uniform access to an ad hoc collection of public and private namespaces. Several of the RDF documents on this site begin with an equally large number of bindings.

But Dan's focus these days isn't on a tag set which requires, or maybe even allows, a large or arbitrary set of namespaces. He's co-chairing one of the HTML working groups. (Poor sod.)

HTML, and equally perhaps other document types that are in basically a single namespace, are an interesting case. Recall that web architecture deals almost exclusively in typed representations. In other words, I don't get just get a bag of bits when I ask for something with HTTP, I get the bits and an internet media type label so I know how to interpret the bits.

Consider what this means for HTML. I ask for a representation and I get a media type, “text/html”, and a sequence of bits:

  1<html xmlns="http://www.w3.org/1999/xhtml">
  2<head>
  3<title>My HTML Document</title>
  4</head>
  5<body>
  6  7</body>
  8</html>

(Queue offline discussion of the right media type for HTML content.)

Now ask yourself, if you know that you were sent HTML, isn't that namespace binding in some sense redundant? If I had accidentally (or intentionally) omitted it, couldn't you have inferred it? After all, nothing else is allowed.

This is the heart of the “implicit namespaces” idea. If the media type registration for a particular media type identifies a default namespace for documents of that type, then the parser could infer a default namespace binding for that namespace. You can even take this a step further if you're feeling really aggressive. If HTML includes a normative reference to SVG, then you could infer not only that the <html> document element has the HTML namespace by default, but also that any <svg> tag in that document has the SVG namespace by default.

Naturally, these defaults only apply to elements that aren't in any namespace. They don't interfere with my ability to experiment with my own Superior Vehicle Grounding (<svg>) markup, provided that I give it an explicit namespace.

So, with a little bit of technical cleanup (assuring, for example, that media types have URIs and that they contain a machine readable assertion of the default namespace), we could technically build a compatible, follow-your-nose web architecture that provided implicit namespace bindings for any vocabulary sufficiently popular to have its own media type.

But, man, that's a lot of work. It certainly requires a lot of scaffolding behind the scenes to hold up the facade that some documents that don't have namespace bindings are, in fact, in a namespace.

There's also the nasty case of what to do when the representation loses its associated media type, when it's stored on a local file system, for example, but we already have that problem to a certain extent. This exacerbates the problem though, for sure.

HTML is the most widely deployed markup application ever. Sometimes, I think the folks designing HTML have some really hard constraints imposed upon them by market forces that they have no leverage to control. They simply can't participate in the XML community unless we help them build bridges from that world to ours.

And sometimes I think the folks designing HTML don't care any more about XML now than they cared about SGML before it. They have market forces to serve and those forces never cared about the well-formedness of pages on the web. In that light, it hardly seems cost-effective to add arbitrary complexity to the XML story in order to maintain an illusion that no on really believes.

The truth probably lies somewhere in the middle. And the implicit namespaces story doesn't seem actively harmful, but it might require a little practice to tell with a straight face.

Comments:

Listen, if I can <span xml:lang="sco">threep it doun the thrapples</span> of TagSoup users that HTML script and style element have SGML CDATA content models, a little bit of noise around implicit namespaces shouldn't be so very hard to swallow.

Posted by John Cowan on 12 Nov 2007 @ 11:39pm UTC #

I think there are additional problems apart from the possible loss of the content-type label (which is bad enough).

What if the content-type is wrong? In the case of xml where the content encoding clashes with the xml declaration, there is a clear error situation. With implicit namespaces there is not (unless the parser is also supposed to infer the namespace from the root element name - I think not!), so mis-labelled entities may not be easily detected.

And why pretend that HTML is actually XHTML (which inferring the xhtml namespace seems to imply)? This is going to cause real problems if an html browser sees an svg tab (without a namespace), and treats it as SVG (this is the sort of thing fault-tolerant - nay - fault ENCOURAGING, technology currently does (I don't know if it does it for SVG, but if not now, it probably will soon).

The XML parser is going to infer the XHTML namespace for the svg element (and so throw a validation error if it is a validating parser). So we are getting different treatment between the browser and the XML processor.

I can't see any benefit in any of this.

Posted by Colin Adams on 14 Nov 2007 @ 09:15am UTC #
Given that html5 is to a large extent trying to specify parsing behaviour similar to that of existing browsers, (but without a dtd) it could be argued that defaulting the namespace is closer to current behaviour than not defaulting it. If you use the xhtml+svg+mathml dtd for example, it defaults all three main namespaces (using dtd attribute defaulting, because it's a dtd:-) but the end result for the user is that assuming (possibly falsely) that the system reads the dtd the user can already go ............

Certainly in an HTML context, if "foreign" languages such as math and svg are going to be allowed (yes please!) you probably don't want to force the user to use the explicit xml-inspired namespace markup.

The amount of "harm" done by these kind of defaults depends a lot on where the defaulting happens. If you have a fragment of mathml in your browser you can (today) cut and paste that fragment and drop it into another mathml application (microsoft word to give a somewhat surprising, but pleasing, example) If however the stuff that is pasted in isn't namespaced then it won't work (getting xslt1 to work with elements that may or may not be in a namespace is a pain, as you must know from docbook). If however the defaulting happens on parsing and the DOM is fully namespaced and what is killed and yanked is a linearisation of that DOM fragment, not a fragment of the source, then defaulting namespaces (and other HTML-style defaults such as end tag omission, are far less intrusive.

Posted by David Carlisle on 14 Nov 2007 @ 11:11am UTC #

I saw two problems, over all others, from XML in-the-trenches and XML in-the-standards.

1: specifically, acquisition/inheritance in markup sucks.

2: generally, implicitness in markup sucks.

I'm not sure what the idea gets you.

"There's also the nasty case of what to do when the representation loses its associated media type, when it's stored on a local file system, for example, but we already have that problem to a certain extent. This exacerbates the problem though, for sure."

I was thinking there's a big assumption for the proposal to work - that markup is being _sent_ (via MIMEish protocols). That's aside from browsers et al being up to this at all (try sending an Atom document into firefox with an embedded xhtml namespace - I had to make late changes to a project last year to deal with that)

Posted by Bill de hOra on 14 Nov 2007 @ 11:27pm UTC #

When we were designing namespaces I really really wanted the namespace URI to identify a catalogue or manifest of some kind that would let you combine namespaces in a meaningful way. You'd say that your foo namespace was actually the union of HTML, SVG and Bird Watching Report, for example, with unlabeled conflicts resolved in that order.

It didn't get traction for a number of reasons, and I suspect it's too late now. I can imagine some kind of Namespace Definition Language for the purpose, with only a little effort...

I think in practice "implicit namespaces" make perfect sense, although I agree about the problem when content becomes disassociated from a MIME media type, which is almost any time anyone does anything with it beyond looking at it in a Web browser, unfortunately.

Posted by Liam Quin on 15 Nov 2007 @ 07:53pm UTC #

I suggested several things similar to that while I was on the TAG -- basically assuming a default URI for a given protocol/format and all undecorated names would be "resolved" relative to that URI. It wasn't exactly shot down as much as it wasn't preferred within the realm of XML (I think we were discussing id attributes at the time). A similar suggestion was later adopted for link relation names in Atom.

Posted by Roy T. Fielding on 16 Nov 2007 @ 06:47pm UTC #

The deeper problem in this question is: What does a parser or processor do if it finds a node from a foreign namespace. You start processing your document from the top, knowing full well how to based on the mime type. Now you've hit a snag: Someone has decided to embed some SVG, but unless it fits semantically into the existing document structure there is no way that a processor can know what to do with it.

My view is that it usually isn't meaningful to add namespaces to XML elements or attributes. It is only meaningful to talk about the type of a whole sub-document, found somewhere useful such as the "content" element of an atom document. RDF documents can make effective use of namespaces only because each triple contains all necessary context to interpret that triple. In general, XML processing depends on a great deal of document context.

If xml namespaces are only being used to distinguish one kind of sub-document from another, all they are doing is competing with mime types. Which is better is a discussion that resolves around the need for a large namespace for document types vs the need to be able to infer super-type information such as "+xml" or "text/" from the type identifier.

I have started gathering notes on restwiki as to how XML documents should be used in machine-to-machine communication. My guidance at the moment is to ignore xml namespaces completely.

Posted by Benjamin Carlyle on 17 Nov 2007 @ 01:17pm UTC #

In my opinion implicitness of namespaces could be a definitely useful concept, but obviously a number of problems with processing of content using an incorrect MIME media type will arise. It is a sad fact, that most users nowadays tend to use MIME type that does not correspond to actual content.

Posted by Sebastian Snopek on 19 Nov 2007 @ 10:22pm UTC #

The idea seems to be quite interesting and surely worth exploring further. However, I'm a bit sceptic if introducing such innovations is really necessary. They would surely make our life easier and the code would be simplified a bit, but I think that using explicit namespaces gives us the assurance of valid processing of the code while implicit inferring is somewhat risky.

Posted by Irek Wajdylo on 29 Dec 2007 @ 01:35pm UTC #

I'd really like to see other languages allowed in HTML. Especially use of SVG is a very temtping feature. I'm actually at odds with defaulting in gerneral. Perhaps it helps for many users to make the code look neater, but then multiple complications too often tend to come about. Particularily the danger of loss of media type while storing a file on local system bothers me. Well, time will tell if any of the namespace solutions catches up.

Posted by Andrea Crevola on 30 Dec 2007 @ 01:17pm UTC #

The idea of incorporating other languages into HTML is surely very tempting (I'm in favour of SVG usage myself) but I a bit sceptical about the whole defaulting thing. It is probably a comfortable solution to set a default, but it can lead to many problems later, as users are extremely inventiv as far as breaking and writing invalid code is concerned.

Posted by Benjamin Fringe on 15 Feb 2008 @ 07:39pm UTC #

There's also the nasty case of what to do when the representation loses its associated media type, when it's stored on a local file system, for example, but we already have that problem to a certain extent...

Posted by Tercüme bürosu on 03 Mar 2008 @ 07:57pm UTC #

And why pretend that HTML is actually XHTML (which inferring the xhtml namespace seems to imply)? This is going to cause real problems if an html browser sees an svg tab (without a namespace), and treats it as SVG (this is the sort of thing fault-tolerant - nay - fault ENCOURAGING, technology currently does (I don't know if it does it for SVG, but if not now, it probably will soon).. The XML parser is going to infer the XHTML namespace for the svg element (and so throw a validation error if it is a validating parser). So we are getting different treatment between the browser and the XML processor...

Posted by ispanyolca tercüman on 05 Mar 2008 @ 03:33pm UTC #

Seems to me that “implicit namespaces” would only encourage lazy programming (writing invalid code). This is already a real problem.

Instead of learning a namespace and calling it when they really want it, this would make it easier to grab pieces of code from other sites, add it too the html, and if it seems to work leaving it, never knowing what it's really suppose to do. When it stops working, complain it's the browsers fault.

Posted by ADAC on 22 Apr 2008 @ 01:16am UTC #

I think you make a good point, but I can't even fathom the load of work that needs to be done to allow inferences.

Posted by Natalia on 15 Jun 2008 @ 09:34am UTC #

That might be a good concept but implicit namespaces may cause problems when content is disassociated from a MIME media type.

Posted by Kamil on 18 Jun 2008 @ 09:41pm UTC #

I am not sure if we need that at all, it would make simple markup language very complicated in my opinion, on the other hand i agree with @ADAC wrong use of namespaces would encourage "lazy programming" and in the end cause more harm then good. Well but this is just my opinion.

Posted by Greg on 07 Sep 2008 @ 05:16am UTC #

In my opinion implicitness of namespaces could be a definitely useful concept, but obviously a number of problems with processing of content using an incorrect MIME media type will arise. It is a sad fact, that most users nowadays tend to use MIME type that does not correspond to actual content...

Posted by Tercüme on 16 Sep 2008 @ 08:54pm UTC #

I am of the opinion that implicit inferring though more convenient can be a bit dangerous when compared to explicit namespaces which do give you the assurance of valid processing.

Posted by Piotr on 14 Dec 2008 @ 12:10am UTC #
Comments on this essay are closed. Thank you, spammers.