Richard goes on to point out that until “there's some other lightweight macro-like facility, DTDs are essential.” I'm not sure I'd go so far as “essential”, I think I could live without it, but it wouldn't be pleasant. For an example of why, take a look at the top of your average W3C specification authored in XML Spec:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE spec SYSTEM "http://www.w3.org/2002/xmlspec/dtd/2.10/xmlspec.dtd" [ <!ENTITY draft.DD "05"> <!ENTITY draft.MM "01"> <!ENTITY draft.day "5"> <!ENTITY draft.month "January"> <!ENTITY draft.year "2006"> <!ENTITY iso6.doc.date "&draft.year;-&draft.MM;-&draft.DD;"> <!ENTITY http-ident "http://example.org/TR/NOTE-example"> ]> <spec w3c-doctype='note'> <header> <title>Example Specification</title> <version>Version 1.0</version> <w3c-designation>&http-ident;-&iso6.doc.date;</w3c-designation> <w3c-doctype>W3C NOTE</w3c-doctype> <pubdate> <day>&draft.day;</day> <month>&draft.month;</month> <year>&draft.year;</year> </pubdate> <publoc> <loc href="&http-ident;-&iso6.doc.date;">&http-ident;-&iso6.doc.date;</loc> </publoc> <altlocs> <loc href="&http-ident;.XML">XML</loc> </altlocs> <latestloc> <loc href="&http-ident;">&http-ident;</loc> </latestloc> …
You don't need all those entities, but keeping all the date-related URIs and publication metadata accurate sure would be more tedious without them. Especially when you consider that as a specification develops it gathers a collection of “previous locations” which all have dates too, so the header becomes a real date soup.
And, of course, you don't need entity expansion to accomplish this. You could use m4 or cpp or any other text replacement tool, even simply sed. But those tools aren't XML-aware and really, you'd like to do this in an XML-aware fashion. (You don't want to do the replacement in the middle of an element name or produce well-formedness errors.)
My solution to this problem was to whip up a little XSLT to do the substitution. The stylesheet ml-macro.xslI was very tempted to use “xml-macro”, but “xml” is a reserved prefix and my ego isn't quite big enough to willfully break that rule., searches for macro names, delimited by two regular expressions, in attribute values and text content, and (recursively) expands them. Macros can be defined in the source document, in an external macro file, or directly in the stylesheet. The latter can be used to build dynamic replacement text, for example, the current date and time.
For my document collection “[[” and “]]” are reasonable delimiters, so I made them the default. You can change them, even on a per-document basis.
The stylesheet recognizes the following constructs:
<?ml-macro name="macroname" text="replacement text"?>Yes, I'm using processing instructions. I think they're the right tool for this job. If PIs offend your aesthetic sensibilities, get over it.
Defines the macro “macroname” with the replacement text “replacement text”. The replacement text may contain other macros, but they must not be used recursively.
Loads macros defined externally in “someURI”. That document should consist of an
ml:collectionelement containing one or more
ml:macroelement has a mandatory
nameattribute containing the name of the macro. The content of the element is the replacement text. In this case, the replacement text can be any well-formed XML fragment, including element content. The replacement text may contain other macros, but they must not be used recursively.
Defines the open delimiter regular expression. The default is effectively
Defines the close delimiter regular expression. The default is effectively
Using this approach, the specification shown above becomes:
<?xml version="1.0" encoding="utf-8"?> <?ml-macro name="draft.DD" text="05"?> <?ml-macro name="draft.MM" text="01"?> <?ml-macro name="draft.day" text="5"?> <?ml-macro name="draft.month" text="January"?> <?ml-macro name="draft.year" text="2006"?> <?ml-macro name="iso6.doc.date" text="[[draft.year]]-[[draft.MM]]-[[draft.DD]]"?> <?ml-macro name="http-ident" text="http://example.org/TR/NOTE-example"?> <spec w3c-doctype='note'> <header> <title>Example Specification</title> <version>Version 1.0</version> <w3c-designation>[[http-ident]]-[[iso6.doc.date]]</w3c-designation> <w3c-doctype>W3C NOTE</w3c-doctype> <pubdate> <day>[[draft.day]]</day> <month>[[draft.month]]</month> <year>[[draft.year]]</year> </pubdate> <publoc> <loc href="[[http-ident]]-[[iso6.doc.date]]">[[http-ident]]-[[iso6.doc.date]]</loc> </publoc> <altlocs> <loc href="[[http-ident]].XML">XML</loc> </altlocs> <latestloc> <loc href="[[http-ident]]">[[http-ident]]</loc> </latestloc> …
This works and I'm going to start using it. With the addition of XInclude to replace external parsed entities (and some uses of external unparsed entities), this approach seems to satisfy the requirements met by entity expansion. Except, of course, for the fact that it uses a new syntax, requires two passes, and isn't supported in any standard way.
On the last point, I hope that when the work of the XML Processing Model Working Group is finished, there will be a standard way to request this kind of processing.
So do I really think we should drop the
Yeah, probably. Tim's got some
pretty good arguments
to support his position that it's not only unnecessary, it's actively harmful.
But I'm not entirely convinced. I don't think we can drop it yet.
Maybe in another few years we can; with a widely deployed pipeline language,
I think the stage would be set.