<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       version="pto">
<info>
<title>Escaped Markup: What To Do Instead</title>
<volumenum>6</volumenum>
<issuenum>86</issuenum>
<pubdate>2003-09-18</pubdate>
<date>$Date: 2005-12-14 12:24:55 -0500 (Wed, 14 Dec 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2003</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>I've argued against escaped markup in several forums: time to
stop for a while. Either I've made my points or I haven't, repeating
myself won't help. But since a number of people have suggested that
I'm not proposing any solutions: here are some solutions. And a
challenge; or at least an exercise that I think might be
interesting.</para>
</abstract>
</info>

<epigraph>
<attribution>K. Beck</attribution>
<para xml:id='p1'>Optimism is an occupational hazard of programming:
testing is the treatment.
</para>
</epigraph>

<para xml:id='p2'>I've written about this a few times now, enough to warrant
<link xlink:href="/threads/escMarkup">a thread</link>
(even though I've mostly abandoned threading),
and I think I've said just about all I can usefully say.</para>

<para xml:id='p3'><link xlink:href="http://www.xml.com/cs/user/view/cs_msg/1472">Apparently</link>
I still haven't specified what
I think the alternatives are in a clear enough fashion. I'll try to
rectify that in this essay.</para>

<para xml:id='p4'>But first, a quick recap.</para>

<para xml:id='p5'>I think escaped markup is inherently dangerous and must be outlawed
in Atom<footnote><para xml:id='p6'>Substitute your favorite Son-of-RSS name for Atom; I'm
agnostic.</para></footnote> and all other specifications. In brief:</para>

<orderedlist>
<listitem><para xml:id='p7'>It moves content that one could reasonably desire to address
with XML tools into a realm where those tools do not and cannot operate.
</para></listitem>
<listitem><para xml:id='p8'>It is, at best, a partial solution to the problem. It fails to
address encoding and other internationalization issues.
</para></listitem>
<listitem><para xml:id='p9'>It encourages naive users to believe that escaped markup is
an acceptable
solution to the general problem of how to stick markup where a schema says
they may not.
</para></listitem>
</orderedlist>

<para xml:id='p10'>The last point, in particular, makes it dangerous. The first two just
make it a nasty kludge.</para>

<para xml:id='p11'>And for the record, I strongly object to the
allusion that my opinion on this matter demonstrates
<link xlink:href="http://www.intertwingly.net/blog/1571.html#c1061476920">ivory tower</link>
thinking. I'm desperately worried about
the practical ramifications of escaped markup.</para>

<para xml:id='p12'>So what are the alternatives?</para>

<orderedlist>
<listitem>
<para xml:id='p13'>Stick to plain text, don't even try to put any markup in there.</para>
<para xml:id='p14'>I think that's a marginally acceptable solution for Atom applications
that are publishing abstracts and pointers, as most of the feeds I read seem
to do.
</para>
<para xml:id='p15'>If the schema for your Atom variant of choice defines the content of
an element so that it can only contain text, this is what you <emphasis>must</emphasis>
do. That's what it means to have schema constraints.</para>
</listitem>
<listitem>
<para xml:id='p16'>Allow markup and insist that it be well-formed.</para>
<para xml:id='p17'>This is arguably the hardest thing to do, but it's not really that hard, is it?
For any piece of content that you want to publish in your feed, you have to
run it through some utility to make it well formed. I argue that such a transformation
is not significantly harder than the transformation needed to properly handle
escaping.</para>
</listitem>
<listitem>
<para xml:id='p18'>If the content you want to syndicate really contains markup that you
can't represent in XML (such as document type declarations), I think there are
three options: use MIME or some other mechanism to make them proper attachments,
leave them on the net somewhere and point to them, or base64 encode them.
</para>
<para xml:id='p19'>What, <link xlink:href="http://www.intertwingly.net/blog/1571.html#c1061478643">demand
some</link> is the gain of base64 encoding?
I'll tell you what the gain is: human authors will
not be encouraged to write base64 by hand. They will not imagine that
trivially escaped markup is the right answer in other problem domains where
they want to put markup in fields that the schema constrains to text.</para>
<para xml:id='p20'>It has no technical gain for the machines (but no significant cost, either),
but tremendously improved semantics for end users.</para>
</listitem>
</orderedlist>

<para xml:id='p21'>I'd like to try a little experiment. Here are two documents, neither
is well-formed XML, but both display <quote>correctly</quote> in my browser
(Mozilla Firebird on Linux):</para>

<itemizedlist>
<listitem>
<para xml:id='p22'><link xlink:href="examples/doc1.htm">doc1.htm</link> [sic] is an ISO 8859-1 document.</para>
</listitem>
<listitem>
<para xml:id='p23'><link xlink:href="examples/doc2.html">doc2.html</link> is a UTF-8 document.</para>
</listitem>
</itemizedlist>

<para xml:id='p24'>Personally, I would syndicate
<link xlink:href="examples/atom1.xml">just the abstracts</link>, but I could syndicate
the
<link xlink:href="examples/atom2.xml">entire contents</link>, if that's what was required.</para>

<para xml:id='p25'>If you think escaped markup is the answer, what does your feed look like?
Do you have tools that build your feed automatically, what does it do with
these files?</para>


</essay>
