<?xml version='1.0' encoding='utf-8' standalone='yes'?>
<?xml-stylesheet type='text/xsl' href='/style/atom-comments.xsl'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<title>norman.walsh.name: Comments on /2003/06/30/hardline</title>
<link rel='alternate' type='text/html' href='http://norman.walsh.name/2003/06/30/hardline'/>
<id>http://norman.walsh.name/2003/06/30/hardline/comments.atom</id>
<updated>2004-09-09T21:48:36Z</updated>

<entry xmlns:foaf='http://xmlns.com/foaf/0.1/'>
<title>Comment 0001 on /2003/06/30/hardline</title>
<link rel='alternate' type='text/html' href='http://norman.walsh.name/2003/06/30/hardline#comment0001'/>
<id>http://norman.walsh.name/2003/06/30/hardline#comment0001</id>
<published>2003-06-30T17:11:53Z</published>
<updated>2003-06-30T17:11:53Z</updated>
<author>
  <name>Karl Ove Hufthammer</name>
  <foaf:mbox_sha1sum>465b51e744e9f54965b945757c64e47cc8dff859</foaf:mbox_sha1sum>
  <uri>http://blogg.huftis.org/</uri>
</author>
<content type='xhtml'><div xmlns="http://www.w3.org/1999/xhtml"><p>For what it's worth, I'm appalled too. All the talk about escaping, and even double-escaping HTML markup in RSS some time ago on various RSS lists literally made me sick inside, and I had to take a long break from involving me in anything RSS-related. And now it looks like the same mistakes will be done all over again with Echo.</p>
<p>But to comment more on the technical side of things: As far as I can see, much confusion seems to stem from a a general lack of understanding of what entities (and entity references), characters, (numeric) character references, bytes and CDATA sections really *are* and how they work in an XML framework'. They're all just mechanisms for writing ordinary character data (i.e. text'). (And entities can also be used as a simple macro language for inserting frequently used text blocks easily.)</p>
<p>XML already *has* a well thought-out mechanism for mixing various XML vocabularies. It's called XML namespaces, and nothing could be simpler to use. To embed an piece of XHTML (or any other XML-based content language, e.g. MathML) in a XML document, such as a Echo document, you just include the relevant piece directly (or by parsing and serialising, to get rid of any unresolved entity references), and put a namespace declaration on it. Example:</p>
<p>&amp;lt;description&amp;gt;&amp;lt;p xmlns="http://www.w3.org/1999/xhtml"&amp;gt;This is &amp;lt;em&amp;gt;my&amp;lt;/em&amp;gt; description&amp;lt;/p&amp;gt;&amp;lt;/description&amp;gt;</p>
<p>(Since the content may span several paragraphs, it'll probably be a good idea to use the XHTML body' element as a wrapper element.)</p>
<p>While it's extremely easy to write, it's even easier to parse. All XML parsers support namespaces, and you just have to dump the contents of each element (XSLT example: &amp;lt;xsl:value-of select="description"/&amp;gt;) to have a nice, readable plain-text version (not all RSS/Echo clients will have a XHTML and CSS rendering engine included). And if the RSS/Echo client works by building a XHTML document, and then sending this to the default browsers, you just have to serialise the description' element again (an identity transformation in XSLT, and likely a one-liner in most XML tools/parsers).</p>
<p>Again, nothing could be simpler.</p></div></content>
</entry>

<entry xmlns:foaf='http://xmlns.com/foaf/0.1/'>
<title>Comment 0002 on /2003/06/30/hardline</title>
<link rel='alternate' type='text/html' href='http://norman.walsh.name/2003/06/30/hardline#comment0002'/>
<id>http://norman.walsh.name/2003/06/30/hardline#comment0002</id>
<published>2003-06-30T19:43:51Z</published>
<updated>2003-06-30T19:43:51Z</updated>
<author>
  <name>Tobi Reif</name>
  <foaf:mbox_sha1sum>def56da92d663527352d4ec18b9fa8c34ba5c90a</foaf:mbox_sha1sum>
  <uri>http://www.pinkjuice.com/</uri>
</author>
<content type='xhtml'><div xmlns="http://www.w3.org/1999/xhtml"><p>You write</p>
<p>"And if you&amp;apos;re expected to recover from broken markup (an expectation that I thought was squashed once and for all in 1998), not only is it much harder still, it&amp;apos;s a playground for all sorts of miscreants to devise trojan horses and other mischief."</p>
<p>but the source of the very same article, while sporting an XML doctype declaration</p>
<p>&amp;lt;!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"&amp;gt;</p>
<p>, includes around ten structural errors such as</p>
<p>&amp;lt;p&amp;gt;Again, nothing could be simpler.&amp;lt;/div&amp;gt;</p>
<p>, which make it malformed.</p>
<p>Check http://snurl.com/1oze .</p>
<p>Preaching well-formed XML will work much better when written in well-formed XML :)</p>
<p>Tobi</p>
<p>P.S.
XHTML, just as RSS, is being parsed, processed, transformed, filtered, indexed, etc.</p></div></content>
</entry>

<entry>
<title>Comment 0003 on /2003/06/30/hardline</title>
<link rel='alternate' type='text/html' href='http://norman.walsh.name/2003/06/30/hardline#comment0003'/>
<id>http://norman.walsh.name/2003/06/30/hardline#comment0003</id>
<published>2003-06-30T20:00:01Z</published>
<updated>2003-06-30T20:00:01Z</updated>
<author>
  <name>Norman Walsh</name>
</author>
<content type='xhtml'><div xmlns="http://www.w3.org/1999/xhtml"><p>Ah. Bleh. The article is well formed and, in fact, valid. But the talkback comments are not. I&amp;apos;ll have to fiddle my CGI script a bit.</p></div></content>
</entry>

<entry xmlns:foaf='http://xmlns.com/foaf/0.1/'>
<title>Comment 0004 on /2003/06/30/hardline</title>
<link rel='alternate' type='text/html' href='http://norman.walsh.name/2003/06/30/hardline#comment0004'/>
<id>http://norman.walsh.name/2003/06/30/hardline#comment0004</id>
<published>2003-06-30T20:49:39Z</published>
<updated>2003-06-30T20:49:39Z</updated>
<author>
  <name>Tobi Reif</name>
  <foaf:mbox_sha1sum>cd6b54f77231d00424ea2e89c9df4bf26d743582</foaf:mbox_sha1sum>
  <uri>http://www.pinkjuice.com/</uri>
</author>
<content type='xhtml'><div xmlns="http://www.w3.org/1999/xhtml"><p>Norm</p>
<p>English is not my native language, so I might miss the meaning, implications, and finer nuances of "Ah." and "Bleh.".</p>
<p>I just thought I&amp;apos;d let you know that the page of the article at
http://norman.walsh.name/2003/06/30/hardline
is/was invalid (since not well-formed).</p>
<p>I hope you didn&amp;apos;t take it as an offense, or nitpicking; the report was meant as helpful feedback.</p>
<p>I very much share both your opions:
1. Including escaped markup in RSS is not a good idea; this should be done via namespaces.
2. Everything that says it&amp;apos;s XML should be well-formed (IMHO, even valid, eg in respect to some standard).</p>
<p>Tobi</p></div></content>
</entry>

<entry>
<title>Comment 0005 on /2003/06/30/hardline</title>
<link rel='alternate' type='text/html' href='http://norman.walsh.name/2003/06/30/hardline#comment0005'/>
<id>http://norman.walsh.name/2003/06/30/hardline#comment0005</id>
<published>2003-06-30T21:55:41Z</published>
<updated>2003-06-30T21:55:41Z</updated>
<author>
  <name>Norman Walsh</name>
</author>
<content type='xhtml'><div xmlns="http://www.w3.org/1999/xhtml"><p>No offense taken, Tobi, I appreciate the report. Most of the uglier problems were in the feedback comments. I&amp;apos;ve fixed the CGI script that includes them. I&amp;apos;ve also fixed a couple of other HTML bugs.</p>
<p>I believe all the pages are valid now.</p></div></content>
</entry>

<entry>
<title>Comment 0006 on /2003/06/30/hardline</title>
<link rel='alternate' type='text/html' href='http://norman.walsh.name/2003/06/30/hardline#comment0006'/>
<id>http://norman.walsh.name/2003/06/30/hardline#comment0006</id>
<published>2003-07-01T10:00:40Z</published>
<updated>2003-07-01T10:00:40Z</updated>
<author>
  <name>Damian Cugley</name>
  <uri>http://www.alleged.org.uk/pdc/</uri>
</author>
<content type='xhtml'><div xmlns="http://www.w3.org/1999/xhtml"><p>The RSS feed for my utterly unimportant web site uses plain text descriptions (stripping out all mark-up) for just that reason -- I do not want to get emmeshed in the ambiguity of escaped HTML content.</p>
<p>My attempts to create RSS readers quickly have been thwarted, because I cannot process RSS in XSLT (my tool of choice) because it is escaped, and, often, not well-formed.  My friends&amp;apos; RSS feeds are generated by a program that takes the first N characters of the HTML data as title -- often slicing a tag in half.</p>
<p>The debate between allowing (or even *requiring*) HTML to be escaped as CDATA is a symptom of the document vs. data views of XML.  Having used the phrase "Echo document", naturally you expect the content to be part of the document.  People who say "Echo datastream" think the content of the entry "is just data" and should be escaped...</p>
<p>There is a weird minority who think that   doing escaping with CDATA (rather than entities) makes it more OK because a human reading the document will be able to see the mark-up as mark-up, even though the processing application cannot.  IMHO, the processing application needs to see the mark-up more than human readers do!</p></div></content>
</entry>

</feed>
