XML Processing Model Working Group

Volume 8, Issue 138; 27 Oct 2005; last modified 08 Oct 2010

Public at last! One of the goals of this working group is to define a standard language for expressing the way in which XML processing is to be applied to a document or set of documents. In other words, how to get validation, XInclude, transformations, and other processes in the right order with the right parameters. Oh, and I'm chairing it.

I'm guessing that most of my readers who are even the slightest bit interested in this topic already have a pretty good idea of what I mean when I say “an XML processing model language” or “an XML pipeline” but in case you don't, I've written about it before, and Jeni Tennison presented an excellent motivational paper at XTech 2005, “Managing Complex Document Generation through Pipelining”.

In any event, after much procedural wrangling, the W3C has finally chartered a new Working Group in the XML Activity to address the problem of a standard XML pipeline language, the XML Processing Model Working Group. And they persuaded me to chair it, for better or worse. :-) The charter lays out the scope, goals, and deliverables pretty well, so I won't bother recapitulating them here.

If you work for a W3C member company interested in pipelines, now's the time to get your AC rep to sign you up! If you don't work for a member company, but you have a burning interest in pipelines, let me know.

The WG doesn't really exist yet; the call-for-participation is out, but I have no idea who will join, what they'll bring to the table, or what they'll want to do. So even though I'm chairing, I won't attempt to speak in any official capacity for the Working Group. (I wouldn't speak in any official capacity in this forum anyway, but nevermind.) That said, as a WG member, I'll put a stake in the ground early on.

There has been a lot of work done in this area. A half hour spent with a search engine and my mail archives turned up all of these efforts either about XML pipelines or directly related to themApologies, in advance, if I left out your favorite. Let me know and I'll happily add it to the list.:

I want to figure out what 80% of the intersection of those languages is, write it up as a specification, slap “1.0” on it, and ship it. As soon as possible. Then I want to get support for that language coded up and into the Java™ Platform so it'll be available everywhere.

I have no philosophical objection to grander processing model schemes, but we needed this in 2001. I have no interest in an ocean boiling exercise to define the be-all and end-all of process choreography languages. Let's keep it simple.

Yes, it is likely that more powerful and sophisticated languages will come after 1.0. Yes, it would be best if the 1.0 language could be extended in some clean way to incorporate these new features. Yes, having some level of abstraction often aids in achieving those goals. But none of those facts diminish the value of a good, small, widely (in the best of all possible worlds, universally) deployed, practical and understandable pipeline language that will let a competent XML hacker craft a functional pipeline in Emacs without needing to refer to the documentation six times. IMNSHO.

But what's really important is your opinion, and we'll find that out by getting consensus on a revision of the requirements document to start with.

Comments

Hi Norm, congratulations for a new WG which should create really needed spec. Do you know that ISO started work on the almost same topic as a part of their DSDL project?

http://lists.dsdl.org/dsdl-discuss/2005-10/0005.html

ISO WG should discuss this during their meeting which is usually scheduled to days preceding XML conference.

—Posted by Jirka Kosek on 27 Oct 2005 @ 08:56 UTC #

By making it an XML language:

1/ you make nearly impossible to embed it in an existing document (flat shell like with | and > could fit in a line in a PI)

2/ you ignore the many various existing scripting environment doing this, which are easier to type, understand and will be both more powerful and natural to people using them

The charter seems to be unambiguous about this, this must be XML and nothing else, see how the compact RNG syntax is so much nicer to programmers than its XML counterpart ! Putting a prerequisite something which should IMHO be the result of a test and requiring feedback sounds a bad start. Just for libxml2 there is 2 special purpose programming language (xed + xmlstarlet) and of course binding for most scripting languages. So there is a need... but nobody so far felt that an XML dialect would be of user interest.

Daniel

—Posted by Daniel Veillard on 27 Oct 2005 @ 09:55 UTC #

The language is vastly less interesting to me if I can't manipulate it with XML tools. I don't object in principle to a compact syntax, like RELAX NG's, but I definitely see it as a supplemental format.

—Posted by Norman Walsh on 27 Oct 2005 @ 10:20 UTC #

Finally! For Xopus we've been waiting for a standardized pipelining language for years.

Did you also consider that with a few extension functions, XSLT 2 could be a pipelining language? But it's probably not the best solution with some important functionality being hidden in xpath expressions. Although it could be an idea to provide an XSLT transformation from your xml pipelining format to XSLT, which would make the semantics of the format unambiguously clear.

Btw, the charter is at this moment behind a username/password.

—Posted by Sjoerd Visscher on 28 Oct 2005 @ 08:16 UTC #

Limiting the language to be XML sounds exactly like trying to design the 'make' facility with the a-priori that to compile C code the make syntax should be C. Or to take another example that 'ant' syntax should be Java.

This is metadata, this is data, but having the same representation and syntaxic constraints as the targetted data sounds illogical to me. And I see no reason set forward anywhere for this a-priori decision.

If this is final, I will rather watch on the side the outcome of the WG rather than try to influence it, since obviously crucial and IMHO broken decisions have already be done.

Daniel

—Posted by Daniel Veillard on 28 Oct 2005 @ 09:02 UTC #

Great news on the new working group, glad to see this finally happening.

You could possibly consider Ant as a pipeline framework, there are certainly built-in XML processing tasks as well as extensions that support this usage. Currently I typically script XML processing pipelines using Ant. A typical scenario is having one task to generate a stylesheet from a meta-stylesheet, then apply the generatated stylesheet to one or more documents.

I've used this approach to build simple static websites (and also generate JSP pages), complete with multi-pass processing of the source documents with different stylesheets to generate indexes, etc.

I'd put Ant in the same general category as XML Pipeline in that its up to the processor to define the processing order based on some analysis of dependencies. Whereas Cocoon and other frameworks have a fixed processing order.

I made an abortive attempt to begin comparing pipeline frameworks back in 2002, some very brief notes are here: XMLPipelineFrameworks, including a comparison of XML-Pipeline and Cocoon.

—Posted by Leigh Dodds on 28 Oct 2005 @ 09:26 UTC #

Daniel, no decisions have been made. I expressed my opinion. Do you have a particular non-XML syntax in mind?

—Posted by Norman Walsh on 28 Oct 2005 @ 12:03 UTC #

The charter is public now. Sorry about that, Sjoerd.

—Posted by Norman Walsh on 28 Oct 2005 @ 12:06 UTC #

Hello,
That's a big news and since the time, we'll be waiting. May be to have a spec now that the idea of "what we should do with" is clearer is better than having a spec that we have to fix a lot.
So i would just mention that Serving XML gives a XML languages too and an implementation. I'm not involved in, but it's worth mentionning it.
Second, making an XML format is good. In fact, if he could stay as simple as XSLT 1.0, no matter. If he begin to be as obcure as XML Schema, the idea of Daniel, to give a compact format, is to take into account. Implementing it in Java is an obvious choice and so we could see how it will interract with SAX and DOM and the brand new StaX.
Because the fact is that a lot of tranformations would not be in XML syntax (XSLT, XQuery, XSP, STX, etc.), but given as programming language code.
To make short, very good news and i'm in.

—Posted by xmlizer on 29 Oct 2005 @ 09:30 UTC #

Only the other day I was asked by a project if an XML 'pipeline language' existed, and had to mutter an answer with a pointer to your note, and I think I may have muttered something about using 'ant'. Now I can say "it's on its way"!

—Posted by Paul Downey on 29 Oct 2005 @ 10:45 UTC #

Norm, I don't have a specific syntax in mind, but I have a set of constraints in mind that an XML based language won't handle well:

1/ simple processing model should be expressable in a single line like a shell command with pipe

2/ being able to embed the expression on how to process a document in the document content itself (PI or comment)

3/ language must be able to express complex processing model

4/ the language should be easy to write and modify by an human

an XML format would fail 1/ 2/ 4/, the experience with XSLT does not sound good w.r.t. 3/ not that it blocks writing complex processing but rather it makes somewhat hard to maintain such complex code in an XML format. The main good points of an XML format from this use case perspective is I18N and not having to rewrite a parser. I don't see those processing model being automatically generated or maintained so the advantage of automated processing of an XML format can't compensate to 1/ 2/ and 4/ in my opinion.

Daniel

—Posted by Daniel Veillard on 29 Oct 2005 @ 06:56 UTC #

Congrats on the new working group. It sounds like Martin Bryan from the DSDL committee / ISO side is very interesting in helping with validation management which could be really great! Is a new processing model likely to mean a new XML Pipeline Definition Language? I mostly use ANT for pipelining similar to Leigh and think it must be a very common approach. I look forward to reading Sean McGrath comments on the new working group and hope he gets involved.

—Posted by Gary on 31 Oct 2005 @ 07:07 UTC #

Back in 2000 I worked on SAX based pipelining/XML processing framework at Chrystal Software called Eclipse, see: http://hadleynet.org/marc/xml2000.pdf.

—Posted by Marc Hadley on 01 Nov 2005 @ 10:20 UTC #