Locating Schemas

Volume 6, Issue 104; 03 Nov 2003

Lots of document processing is predecated on the idea that the document being processed is valid. Validity assessment requires a schema. So given a document, what’s the right schema to apply?

One must look for one thing only, to find many.

Cesare Pavese

Given a random XML document, one of the things you might want to know about it is, “what is an appropriate schema to apply”? Now, for DTDs, this is a simple question: look at the document type declaration. If it has one, the public and system identifiers will tell you what DTD to apply; if it doesn’t have one, there’s no DTD to apply.

In the schema world, the question is more complicated. For one thing, there might be many different schemas that could be appliedThis is, of course, equally true of DTDs, but the XML Recommendation doesn’t provide any flexibility to apply alternate DTDs.. The “schema location hint” of W3C XML Schema is one possibility. But what a hack.

Another possibility is some sort of processing instruction. Some of my colleagues frown on them, but I like them. However, a single, standard PI probably isn’t going to do the trick. Getting all the flexibility one wants into a single PI is likely to be hard and will probably result in a pretty unweildy PI.

James Clark proposes another answer in nXML mode: a configurable set of rules to locate a schema. The rules are contained in one or more schema locating files, which are XML documents.

Makes sense to me. So much so that I went off yesterday and constructed a Perl implementation of the locating rules algorithm.

The module implements applyFollowingRules , default , doctypePublicId , documentElement , include , namespace , transformURI , typeId , typeIdBase , typeIdProcessingInstruction , and uri. Note that several of these are not yet implemented by nxml-mode and are not even allowed by the nxml-mode schema for locating rules files. If the syntax or semantics of locating rules files changes, I’ll try to keep the module up to date.

Share and enjoy.

(N.B. It’s pretty rough alpha code at the moment and not very well documented. Please report problems, if you find any.)