<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
<title>Drop the &lt;!DOCTYPE&gt;</title><biblioid class="uri">http://norman.walsh.name/2006/01/06/doctype</biblioid>
<volumenum>9</volumenum>
<issuenum>4</issuenum>
<pubdate>2006-01-06T10:59:37-05:00</pubdate>
<date>$Date: 2006-01-06 11:46:24 -0500 (Fri, 06 Jan 2006) $</date>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2006</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>If we're going to drop the document type declaration, we need to
provide something that behaves like entity expansion. With a little
XSLT 2.0, that's not hard. With a pipeline language, we could even
do it in a standard way.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XSLT2"/>
</info>

<para xml:id="p1"><personname>
      <firstname>Tim</firstname>
<surname role="suppress">Bray</surname>
    </personname> says we should
<link xlink:href="http://www.tbray.org/ongoing/When/200x/2005/12/15/Drop-the-Doctype">drop it</link>.
I've
<link xlink:href="/2005/12/16/xmlk">expressed sympathy</link> for that position.
But <personname>
      <firstname>Richard</firstname>
<surname role="suppress">Tobin</surname>
    </personname>
<link xlink:href="http://lists.w3.org/Archives/Public/public-xml-core-wg/2005Dec/0011.html">fired right back</link>: “Absolutely not!”</para>

<para xml:id="p2">Richard goes on to point out that until “there's some other
lightweight macro-like facility, DTDs are essential.” I'm not sure I'd
go so far as “essential”, I think I could live without it, but it
wouldn't be pleasant. For an example of why, take a look at the top of
your 
<link xlink:href="examples/specent.xml">average W3C specification</link>
authored in
<link xlink:href="http://www.w3.org/2002/xmlspec/">XML Spec</link>:</para>

<programlisting>&lt;?xml version="1.0" encoding="utf-8"?&gt;
&lt;!DOCTYPE spec SYSTEM "http://www.w3.org/2002/xmlspec/dtd/2.10/xmlspec.dtd" [
&lt;!ENTITY draft.DD "05"&gt;
&lt;!ENTITY draft.MM "01"&gt;
&lt;!ENTITY draft.day "5"&gt;
&lt;!ENTITY draft.month "January"&gt;
&lt;!ENTITY draft.year "2006"&gt;
&lt;!ENTITY iso6.doc.date "&amp;draft.year;-&amp;draft.MM;-&amp;draft.DD;"&gt;
&lt;!ENTITY http-ident "http://example.org/TR/NOTE-example"&gt;
]&gt;
&lt;spec w3c-doctype='note'&gt;
&lt;header&gt;
&lt;title&gt;Example Specification&lt;/title&gt;
&lt;version&gt;Version 1.0&lt;/version&gt;
&lt;w3c-designation&gt;&amp;http-ident;-&amp;iso6.doc.date;&lt;/w3c-designation&gt;
&lt;w3c-doctype&gt;W3C NOTE&lt;/w3c-doctype&gt;
&lt;pubdate&gt;
&lt;day&gt;&amp;draft.day;&lt;/day&gt;
&lt;month&gt;&amp;draft.month;&lt;/month&gt;
&lt;year&gt;&amp;draft.year;&lt;/year&gt;
&lt;/pubdate&gt;
&lt;publoc&gt;
  &lt;loc href="&amp;http-ident;-&amp;iso6.doc.date;"&gt;&amp;http-ident;-&amp;iso6.doc.date;&lt;/loc&gt;
&lt;/publoc&gt;
&lt;altlocs&gt;
  &lt;loc href="&amp;http-ident;.XML"&gt;XML&lt;/loc&gt;
&lt;/altlocs&gt;
&lt;latestloc&gt;
  &lt;loc href="&amp;http-ident;"&gt;&amp;http-ident;&lt;/loc&gt;
&lt;/latestloc&gt;
…</programlisting>

<para xml:id="p3">You don't <emphasis>need</emphasis> all those entities, but
keeping all the date-related URIs and publication metadata accurate
sure would be more tedious without them. Especially when you consider
that as a specification develops it gathers a collection of “previous
locations” which all have dates too, so the header becomes a real date
soup.
</para>

<para xml:id="p4">And, of course, you don't <emphasis>need</emphasis> entity
expansion to accomplish this. You could use
<link xlink:href="http://en.wikipedia.org/wiki/M4_%28computer_language%29">m4</link>
or
<link xlink:href="http://en.wikipedia.org/wiki/C_preprocessor">cpp</link>
or any other text replacement tool, even simply 
<link xlink:href="http://en.wikipedia.org/wiki/Sed">sed</link>.
But those tools aren't XML-aware and really, you'd like to do this in
an XML-aware fashion. (You don't want to do the replacement in the
middle of an element name or produce well-formedness errors.)</para>

<para xml:id="p5">My solution to this problem was to whip up a little XSLT to do the
substitution. The stylesheet
<link xlink:href="examples/ml-macro.xsl">ml-macro.xsl</link><footnote>
<para xml:id="p6">I was <emphasis>very</emphasis> tempted to use
“xml-macro”, but “xml” is a reserved prefix and my ego isn't quite big
enough to willfully break that rule.</para>
    </footnote>, searches
for macro names, delimited by two regular expressions, in attribute
values and text content, and (recursively) expands them. Macros can be
defined in the source document, in an external macro file, or directly
in the stylesheet. The latter can be used to build dynamic replacement
text, for example, the current date and time.</para>

<para xml:id="p7">For my document collection “[[” and “]]” are reasonable delimiters,
so I made them the default. You can change them, even on a per-document
basis.</para>

<para xml:id="p8">The stylesheet recognizes the following constructs:</para>

<variablelist>
<varlistentry>
<term>
	<tag class="xmlpi">ml-macro name="macroname" text="replacement text"</tag>
	<footnote>
<para xml:id="p9">Yes, I'm using processing instructions. I think they're the right tool
for this job. If PIs offend your aesthetic sensibilities, get over it.</para>
</footnote>
      </term>
<listitem>
<para xml:id="p10">Defines the macro “macroname” with the replacement text “replacement text”.
The replacement text may contain other macros, but they must not be used
recursively.
</para>
</listitem>
</varlistentry>

<varlistentry>
<term>
	<tag class="xmlpi">ml-macro href="someURI"</tag>
      </term>
<listitem>
<para xml:id="p11">Loads macros defined externally in “someURI”. That document should
consist of an <tag>ml:collection</tag> element containing one or more
<tag>ml:macro</tag> elements. Each <tag>ml:macro</tag> element has a 
mandatory <tag class="attribute">name</tag> attribute containing the name
of the macro. The content of the element is the replacement text. In
this case, the replacement text can be any well-formed XML fragment,
including element content. The replacement text may contain other
macros, but they must not be used recursively.
</para>
</listitem>
</varlistentry>

<varlistentry>
<term>
	<tag class="xmlpi">ml-macro-odre</tag>
      </term>
<listitem>
<para xml:id="p12">Defines the open delimiter regular expression. The default is
effectively <tag class="xmlpi">ml-macro-odre \[\[</tag>.
</para>
</listitem>
</varlistentry>

<varlistentry>
<term>
	<tag class="xmlpi">ml-macro-cdre</tag>
      </term>
<listitem>
<para xml:id="p13">Defines the close delimiter regular expression. The default is
effectively <tag class="xmlpi">ml-macro-odre \]\]</tag>.
</para>
</listitem>
</varlistentry>
</variablelist>

<para xml:id="p14">Using this approach, the specification shown above
<link xlink:href="examples/specmac.xml">becomes</link>:</para>

<programlisting>&lt;?xml version="1.0" encoding="utf-8"?&gt;
&lt;?ml-macro name="draft.DD"    text="05"?&gt;
&lt;?ml-macro name="draft.MM"    text="01"?&gt;
&lt;?ml-macro name="draft.day"   text="5"?&gt;
&lt;?ml-macro name="draft.month" text="January"?&gt;
&lt;?ml-macro name="draft.year"  text="2006"?&gt;
&lt;?ml-macro name="iso6.doc.date" text="[[draft.year]]-[[draft.MM]]-[[draft.DD]]"?&gt;
&lt;?ml-macro name="http-ident"  text="http://example.org/TR/NOTE-example"?&gt;
&lt;spec w3c-doctype='note'&gt;
&lt;header&gt;
&lt;title&gt;Example Specification&lt;/title&gt;
&lt;version&gt;Version 1.0&lt;/version&gt;
&lt;w3c-designation&gt;[[http-ident]]-[[iso6.doc.date]]&lt;/w3c-designation&gt;
&lt;w3c-doctype&gt;W3C NOTE&lt;/w3c-doctype&gt;
&lt;pubdate&gt;
&lt;day&gt;[[draft.day]]&lt;/day&gt;
&lt;month&gt;[[draft.month]]&lt;/month&gt;
&lt;year&gt;[[draft.year]]&lt;/year&gt;
&lt;/pubdate&gt;
&lt;publoc&gt;
  &lt;loc href="[[http-ident]]-[[iso6.doc.date]]"&gt;[[http-ident]]-[[iso6.doc.date]]&lt;/loc&gt;
&lt;/publoc&gt;
&lt;altlocs&gt;
  &lt;loc href="[[http-ident]].XML"&gt;XML&lt;/loc&gt;
&lt;/altlocs&gt;
&lt;latestloc&gt;
  &lt;loc href="[[http-ident]]"&gt;[[http-ident]]&lt;/loc&gt;
&lt;/latestloc&gt;
…</programlisting>

<para xml:id="p15">This works and I'm going to start using it. With the addition of
<link xlink:href="http://en.wikipedia.org/wiki/XInclude">XInclude</link>
to replace external parsed entities (and some uses of external
unparsed entities), this approach seems to satisfy the requirements met
by entity expansion. Except, of course, for the fact that it uses a new
syntax, requires two passes, and isn't supported in any standard way.</para>

<para xml:id="p16">On the last point, I hope that when the work of the
<link xlink:href="http://www.w3.org/XML/Processing/">XML Processing Model
Working Group</link> is finished, there <emphasis>will be</emphasis> a
standard way to request this kind of processing.</para>

<para xml:id="p17">So do I really think we should drop the <code>&lt;!DOCTYPE&gt;</code>?
Yeah, probably. Tim's got some
<link xlink:href="http://www.tbray.org/ongoing/When/200x/2005/12/15/Drop-the-Doctype#p-2">pretty good arguments</link>
to support his position that it's not only unnecessary, it's actively harmful.
But I'm not entirely convinced. I don't think we can drop it yet.
Maybe in another few years we can; with a widely deployed pipeline language,
I think the stage would be set.</para>

</essay>

