I've argued against escaped markup in several forums: time to stop for a while. Either I've made my points or I haven't, repeating myself won't help. But since a number of people have suggested that I'm not proposing any solutions: here are some solutions. And a challenge; or at least an exercise that I think might be interesting.
Optimism is an occupational hazard of programming: testing is the treatment.
I've written about this a few times now, enough to warrant a thread (even though I've mostly abandoned threading), and I think I've said just about all I can usefully say.
Apparently I still haven't specified what I think the alternatives are in a clear enough fashion. I'll try to rectify that in this essay.
But first, a quick recap.
I think escaped markup is inherently dangerous and must be outlawed in Atom and all other specifications. In brief:
It moves content that one could reasonably desire to address with XML tools into a realm where those tools do not and cannot operate.
It is, at best, a partial solution to the problem. It fails to address encoding and other internationalization issues.
It encourages naive users to believe that escaped markup is an acceptable solution to the general problem of how to stick markup where a schema says they may not.
The last point, in particular, makes it dangerous. The first two just make it a nasty kludge.
And for the record, I strongly object to the allusion that my opinion on this matter demonstrates ivory tower thinking. I'm desperately worried about the practical ramifications of escaped markup.
So what are the alternatives?
Stick to plain text, don't even try to put any markup in there.
I think that's a marginally acceptable solution for Atom applications that are publishing abstracts and pointers, as most of the feeds I read seem to do.
If the schema for your Atom variant of choice defines the content of an element so that it can only contain text, this is what you must do. That's what it means to have schema constraints.
Allow markup and insist that it be well-formed.
This is arguably the hardest thing to do, but it's not really that hard, is it? For any piece of content that you want to publish in your feed, you have to run it through some utility to make it well formed. I argue that such a transformation is not significantly harder than the transformation needed to properly handle escaping.
If the content you want to syndicate really contains markup that you can't represent in XML (such as document type declarations), I think there are three options: use MIME or some other mechanism to make them proper attachments, leave them on the net somewhere and point to them, or base64 encode them.
What, demand some is the gain of base64 encoding? I'll tell you what the gain is: human authors will not be encouraged to write base64 by hand. They will not imagine that trivially escaped markup is the right answer in other problem domains where they want to put markup in fields that the schema constrains to text.
It has no technical gain for the machines (but no significant cost, either), but tremendously improved semantics for end users.
I'd like to try a little experiment. Here are two documents, neither is well-formed XML, but both display “correctly” in my browser (Mozilla Firebird on Linux):
Personally, I would syndicate just the abstracts, but I could syndicate the entire contents, if that's what was required.
If you think escaped markup is the answer, what does your feed look like? Do you have tools that build your feed automatically, what does it do with these files?
Substitute your favorite Son-of-RSS name for Atom; I'm agnostic.