Playing with transclusion

Volume 14, Issue 35; 04 Oct 2011

In my continuing efforts to explore transclusion, some running code.

I've mentioned transclusion before (and before that). The DocBook community would like a better solution for transclusion in DocBook documents but the problems that DocBook authors face with respect to ID/IDREF issues are hardly unique to DocBook. In an ideal world, I think we'd figure out how to solve this problem for XML users, rather than just DocBook users.

On that basis, I took the issue to the XML Core Working Group because it seem(ed) like something XInclude might be extended to handle. That's not so clear to me now.

Several things make it difficult to handle this problem at the level above a specific vocabulary. (For a reminder about what the problems are, see the requirements and markup design proposed for DocBook.)

  1. The processor must be able to identify “ID” values in the transcluded content. That requires more than just the bare minimum necessary to handle well-formed XML. But there is xml:id which helps here.

  2. Unfortunately, the processor must also be able to identify “IDREF” values in the transcluded content. Not only does that require more than the bare minimum, there's nothing like “xml:idref” to help us here.

  3. What's more, if the author has no way to specify parameters to the fixup process, then there are a limited number of fixup options actually available.

    The transclusion proposal identifies four possible fixup modes: none, strip, prefix, and auto. Specified globally, “none” and “strip” make no sense: none is the status quo and strip discards all the IDs. With only a single, global prefix, “prefix” doesn't make any sense either. So that just leaves “auto”.

    There are also four options for scoping IDREF values: user, local, near, and global. Here, “user” is the status quo. The other values are all plausible, but there's another problem: if the author can't override the ID value on the root of the included content, then (auto) fixup makes it impossible to link across a transclusion boundary. That limits the value of transclusion in a different way.

I understand these things better now because I did a little implementation.

When I approached the the XML Core WG, points one and two above were raised almost immediately. With those issues clearly making the group reluctant to try to add any sort of ID/IDREF fixup to XInclude, I suggested that maybe, if XInclude just provided some clue about where the inclusion occurred, that would be enough. I took an action to try to figure out if that was true.

XInclude already performs “XML base” and “XML language” fixup, adding xml:base and xml:lang attributes to the root elements included if necessary. I proposed adding xml:root="true" to the root elements as well. (I'm not serious about the name, just the functionality.)

In order to explore if this is sufficient or not, I hacked XML Calabash 0.9.35 a bit. If you specify cx:mark-roots="true" as an extension attribute on the XInclude step, it'll add a cx:root attribute (with the value “true” to the root elements included.

Then I wrote a stylesheet to perform the transclusion processing.

The results were unsatisfying for the reasons outlined in point three above.

So I made an even more egregious hack. If you specify cx:copy-attributes="true" as an extension attribute on the XInclude step, it'll copy any namespace qualified attributes that appear on the xi:include element onto the root elements included.

In other words, if you run this document:

<book xmlns="http://docbook.org/ns/docbook"
      xmlns:xi="http://www.w3.org/2001/XInclude"
      xmlns:cx="http://xmlcalabash.com/ns/extensions">
<title>Book Title</title>

<xi:include href="chap1.xml" cx:prefix="c1" cx:id="refid1"/>
<xi:include href="chap2.xml" cx:prefix="c2" cx:idfixup="prefix" cx:linkscope="local"/>

Through this pipeline:

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc" version="1.0"
            xmlns:cx="http://xmlcalabash.com/ns/extensions">

<p:xinclude cx:mark-roots="true" cx:copy-attributes='true'/>

</p:pipeline>

You'll get something like this:

<book xmlns="http://docbook.org/ns/docbook"
      xmlns:cx="http://xmlcalabash.com/ns/extensions"
      xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Book Title</title>

<chapter cx:prefix="c1" cx:id="refid1" xml:id="chap1" cx:root="true">
<title>First Chapter</title>
...

Yes, it's an awful hack. No, it's probably not compliant with the specs. But it gives us a way to explore the problem with actual running code and examples.

Here's the current state of my stylesheet that performs transclusion processing:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:db="http://docbook.org/ns/docbook"
                xmlns:cx="http://xmlcalabash.com/ns/extensions"
		exclude-result-prefixes="xs db cx"
                version="2.0">

<xsl:param name="default-idfixup" select="'prefix'"/>
<xsl:param name="default-linkscope" select="'local'"/>
<xsl:param name="default-prefix" select="'xxx-'"/>

<xsl:key name="id" match="*" use="@xml:id"/>

<xsl:preserve-space elements="*"/>

<xsl:template match="*[@xml:id and ancestor-or-self::*[@cx:root='true']]">
  <xsl:variable name="idfixup"
                select="(ancestor-or-self::*[@cx:idfixup][1]/@cx:idfixup, $default-idfixup)[1]"/>
  <xsl:variable name="linkscope"
                select="(ancestor-or-self::*[@cx:linkscope][1]/@cx:linkscope, $default-linkscope)[1]"/>
  <xsl:variable name="id" select="if (@cx:root='true' and @cx:id) then string(@cx:id) else string(@xml:id)"/>

  <xsl:copy>
    <xsl:choose>
      <xsl:when test="$idfixup = 'none'">
        <xsl:copy-of select="@* except (@cx:*)"/>
      </xsl:when>
      <xsl:when test="$idfixup = 'strip'">
        <xsl:copy-of select="@* except (@xml:id|@cx:*)"/>
      </xsl:when>
      <xsl:when test="$idfixup = 'prefix' or $idfixup = 'auto'">
        <xsl:copy-of select="@* except (@xml:id|@cx:*)"/>
        <xsl:attribute name="xml:id" select="concat(cx:prefix(ancestor-or-self::*[@cx:root='true'][1]),$id)"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:message terminate="yes">
          <xsl:text>Invalid $idfixup value: </xsl:text>
          <xsl:value-of select="$idfixup"/>
        </xsl:message>
      </xsl:otherwise>
    </xsl:choose>

    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

<xsl:template match="*[@linkend and ancestor-or-self::*[@cx:root='true']]">
  <xsl:variable name="idfixup"
                select="(ancestor-or-self::*[@cx:idfixup][1]/@cx:idfixup, $default-idfixup)[1]"/>
  <xsl:variable name="linkscope"
                select="(ancestor-or-self::*[@cx:linkscope][1]/@cx:linkscope, $default-linkscope)[1]"/>

  <xsl:copy>
    <xsl:choose>
      <xsl:when test="$linkscope = 'user'">
        <xsl:copy-of select="@* except (@cx:*)"/>
      </xsl:when>
      <xsl:when test="$linkscope = 'local'">
        <xsl:copy-of select="@* except (@linkend|@cx:*)"/>

        <xsl:choose>
          <xsl:when test="$idfixup = 'prefix' or $idfixup = 'auto'">
            <xsl:attribute name="linkend"
                           select="concat(cx:prefix(ancestor-or-self::*[@cx:root='true'][1]),@linkend)"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:message terminate="yes">
              <xsl:text>Unsupported $idfixup value for $linkscope="local": </xsl:text>
              <xsl:value-of select="$idfixup"/>
            </xsl:message>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>

      <xsl:when test="$linkscope = 'near'">
        <xsl:copy-of select="@* except (@linkend|@cx:*)"/>

        <xsl:variable name="id" select="@linkend"/>

        <xsl:variable name="target" as="element()">
          <xsl:choose>
            <xsl:when test="preceding::*[@xml:id=$id]">
              <xsl:sequence select="preceding::*[@xml:id=$id][1]"/>
            </xsl:when>
            <xsl:otherwise>
              <xsl:sequence select="following::*[@xml:id=$id][1]"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:variable>

        <xsl:choose>
          <xsl:when test="$target/ancestor-or-self::*[@cx:root='true']">
            <xsl:variable name="root" select="$target/ancestor-or-self::*[@cx:root='true'][1]"/>
            <xsl:choose>
              <xsl:when test="$idfixup = 'prefix' or $idfixup = 'auto'">
                <xsl:attribute name="linkend" select="concat(cx:prefix($root),@linkend)"/>
              </xsl:when>
              <xsl:otherwise>
                <xsl:copy-of select="@linkend"/>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:when>
          <xsl:otherwise>
            <xsl:copy-of select="@linkend"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>

      <xsl:when test="$linkscope = 'global'">
        <xsl:copy-of select="@* except (@linkend|@cx:*)"/>

        <xsl:variable name="target" select="key('id', @linkend)[1]"/>

        <xsl:choose>
          <xsl:when test="$target/ancestor-or-self::*[@cx:root='true']">
            <xsl:variable name="root" select="$target/ancestor-or-self::*[@cx:root='true'][1]"/>
            <xsl:choose>
              <xsl:when test="$idfixup = 'prefix' or $idfixup = 'auto'">
                <xsl:attribute name="linkend" select="concat(cx:prefix($root),@linkend)"/>
              </xsl:when>
              <xsl:otherwise>
                <xsl:copy-of select="@linkend"/>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:when>
          <xsl:otherwise>
            <xsl:copy-of select="@linkend"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <xsl:otherwise>
        <xsl:message terminate="yes">Invalid $idfixup value: <xsl:value-of select="$idfixup"/></xsl:message>
      </xsl:otherwise>
    </xsl:choose>

    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@* except (@cx:*)"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

<xsl:template match="comment()|processing-instruction()|text()">
  <xsl:copy/>
</xsl:template>

<xsl:function name="cx:prefix" as="xs:string">
  <xsl:param name="node" as="element()"/>

  <xsl:variable name="idfixup"
                select="($node/ancestor-or-self::*[@cx:idfixup][1]/@cx:idfixup, $default-idfixup)[1]"/>
  <xsl:variable name="linkscope"
                select="($node/ancestor-or-self::*[@cx:linkscope][1]/@cx:linkscope, $default-linkscope)[1]"/>
  <xsl:variable name="prefix"
                select="($node/ancestor-or-self::*[@cx:prefix][1]/@cx:prefix, $default-prefix)[1]"/>

  <xsl:choose>
    <xsl:when test="$idfixup = 'prefix' and $node/@cx:prefix">
      <xsl:value-of select="$node/@cx:prefix"/>
    </xsl:when>
    <xsl:when test="$idfixup = 'prefix'">
      <xsl:value-of select="$prefix"/>
    </xsl:when>
    <xsl:otherwise>
      <!-- auto -->
      <xsl:value-of select="concat(generate-id($node),'-')"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

</xsl:stylesheet>

The following attributes are recognized on the included roots:

cx:idfixup

The “idfixup” scheme to use for this transclusion.

cx:linkscope

The “linkscope” scheme to use for this transclusion.

cx:prefix

The prefix to use for this transclusion, only relevant for cx:idfixup="prefix".

cx:id

The ID value to use for the top-level of the transcluded content.

You'll need the latest XML Calabash, if you want to try it out.

On the one hand, it seems to be possible to implement several different transclusion scenarios this way. On the other hand, I'm disappointed by how much author-involvement is necessary to get good results. In particular, observe that if there are nested transclusions, it's not at all clear (to me) which way the precedence should run.

Comments and suggestions most welcome, as always.

Comments

The id fixup logic be also used when validating the document (e.g. in an editor), correct? I.e. you wouldn't get spurious repeated id errors if you include content with the same id more than once as long as you strip or fix the ids.

—Posted by David Cramer on 03 Oct 2011 @ 09:39 UTC #

Yes, I think it would make sense to do validation after fixup. Pipelines FTW.

—Posted by Norman Walsh on 04 Oct 2011 @ 02:53 UTC #

There are also for options for scoping IDREF values

Typo IMHO "for" should be four

—Posted by Mr Typo on 04 Oct 2011 @ 12:44 UTC #

Thank you, Mr. Typo. Fixed.

—Posted by Norman Walsh on 04 Oct 2011 @ 01:36 UTC #

Your XML inclusions in this page appear to have vanished.

—Posted by Daniel Lyons on 10 Apr 2013 @ 06:52 UTC #

Yikes! Right you are. Will fix ASAP!

—Posted by Norman Walsh on 18 Apr 2013 @ 10:43 UTC #

Must have forgotten to flush the cached HTML after a bug fix. Works now.

—Posted by Norman Walsh on 19 Apr 2013 @ 04:05 UTC #