DocBook XSD?

Volume 14, Issue 52; 19 Dec 2011

But is it valid? Not the document, the schema?

When we switched to RELAX NG for DocBook, we did it because it was the best choice. It's still the best choice: in RELAX NG, the sorts of constraints that prose-schemas require are easy to express and there are no unnecessary ambiguity constraints.

I had always planned to generate W3C XML Schemas (and DTDs) by mechanical translation from the RELAX NG sources. There's an argument to be made for making the sources something else entirely and generating all the schemas from that, but it never seemed necessary (and I don't think it would help).

The problem is, it's fiendishly hard to write a tool that will translate from DocBook RNG to DocBook XSD. There are some recognizable patterns, but then there are some things (like merging the content models for HTML and CALS tables) that seem to defy straightfoward heuristics.

I spent tens, perhaps hundreds, of hours in three separate equally failed attempts to write a transformation (or series of transformations) to go from DocBook RNG to DocBook XSD. I still think it's possible, but probably not without building some sort of a larger framework. (If you've got a grad student looking for a project, …)

Torn between the prospect of trying again or just admitting defeat and writing the schema by hand, I finally caved and decided to write it by hand. It took several days and was fairly tedious, but I got there. Maybe.

It turns out that there's room for interpretation in what constitutes ambiguity in XML Schemas. Ambiguous ambiguity. Oh, joy!

So if your favorite XML tool still uses XSD exclusively, here's what I'd like to know:

  1. Have you asked about RELAX NG support? It is the [expletive deleted, -ed] twenty-first century!

  2. Does my hand crafted DocBook XSD ( XSD 1.0 or XSD 1.1) work in that tool, or does the tool complain that the schema is invalid?

I fear that removing the ambiguity currently present would be an enormous pain in the [expletive deleted -ed]. If I don't have to…


Did you try Trang? I had quite some success with using its products with Eclipse XML editor (for much less complicated schema).

—Posted by Matěj Cepl on 19 Dec 2011 @ 05:33 UTC #

The DocBook schemas are way beyond the capabilities of Trang.

—Posted by Norman Walsh on 19 Dec 2011 @ 05:43 UTC #

Maybe, what I meant is that I am not sure how really acute the lack of XSD really is, because for many purposes even such half-solutions seems to work.

—Posted by Matěj Cepl on 19 Dec 2011 @ 06:54 UTC #

Just did a quick test in Xopus. We don't support the processContents attribute on xs:any, so that's a step back from the previous XSD. But other than that, seems to be loading just fine...

And for an editor like Xopus a good XSD is necessary, Trang is indeed not an option in this case. So thanks for making this effort!

—Posted by Fredrik Geers on 20 Dec 2011 @ 11:07 UTC #

Using Oxygen 13.1, validation of the 1.0 XSD file fails with lots of errors. The first one reads

docbook-ns-10.xsd Programmname: Xerces Feherlevel: error Beschreibung: cos-nonambig: "":xref and "":xref (or elements from their substitution group) violate "Unique Particle Attribution". During validation against this schema, ambiguity would be created for those two particles. Start: 450:5 End: 450:19

It complains about source code line450, which is a refernce to the db.common.attributes group, inside the definition of the complexType named inline.type starting at line 437.

I think there are about 33 errors like this. As far as i can see always cos-nonambig:

Allthough Oxygen supports RELAX very well, we are highly interested in a good XSD for docbook. The main reason is Adobe FrameMaker, which is in fact based on DTD., but does support XSD (translates it to DTD inernally?)

We hav to customize docbook before using it in FrameMaker because some of FMs limitations. (Nested Tables are not supported, for example). Doing so in RELAX is useless, because there is no way to generate the customized DTD. My hope is, that cusomization can be don in XSD, and that would help to get the customized DTD via standard Tools (like the one builtin in Oxygen).

Cheers, Frank

—Posted by Frank Steimke on 22 Dec 2011 @ 11:18 UTC #

Thanks, Frank. I've discovered that Xerces reports ambiguity problems so I've got a line on fixing them. I'm just hoping it's not too painful to do so.

—Posted by Norman Walsh on 22 Dec 2011 @ 11:57 UTC #