<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
    
    
<title>RDFa for DocBook?</title><biblioid class="uri">http://norman.walsh.name/2009/09/22/RDFaForDocBook</biblioid>
<volumenum>12</volumenum>
<issuenum>30</issuenum>
<pubdate>2009-09-22T14:20:39+01:00</pubdate>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2009</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>Adding RDFa to DocBook would make it possible to add a class of
semantic annotations to DocBook without changing the schema.
But is that a good idea?</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#DocBook"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#RDF"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XMLSummerSchool2009"/>
</info>

<epigraph>
<attribution>
      <personname>
<firstname>Samuel</firstname>
	<surname>Johnson</surname>
</personname>
    </attribution>
<para xml:id="p2">Knowledge is of two kinds. We know a subject ourselves, or we
know where we can find information on it.</para>
</epigraph>

<para xml:id="p1">When <personname>
      <firstname>Bob</firstname>
<surname>DuCharme</surname>
    </personname> introduced
the semantic web track at 
<link xlink:href="http://www.xmlsummerschool.com/">XML Summer School</link>
this morning, he mentioned briefly the idea of adding
<wikipedia>RDFa</wikipedia> to vocabularies
other than (X)HTML. In particular, he's investigated how to
<link xlink:href="http://www.devx.com/semantic/Article/42543/0/page/3">do it
in DocBook</link>.</para>

<para xml:id="p3">The DocBook TC gets periodic requests to add new inline elements and
attributes for bits of metadata. Sometimes the requests are entirely
legitimate, in the sense that they're clearly about technical documentation,
but seem to apply to such a small audience that the TC is reluctant to
add them to all of DocBook.</para>

<para xml:id="p4">With this in mind, the idea of adding RDFa has some appeal: we add a
few new attributes and henceforth users will be able to add new bits of
metadata without having to change the DocBook schema.</para>

<para xml:id="p5">But I'm not sure.</para>

<para xml:id="p6">First, lots of DocBook elements have more discrete semantics than 
HTML elements. We don't need to say</para>

<programlisting>&lt;phrase property="dc:title"&gt;Beautiful Sunset&lt;/phrase&gt;</programlisting>

<para xml:id="p7">because we have <tag>citetitle</tag>. We don't need to say:</para>

<programlisting>&lt;info&gt;
  &lt;bibliomisc&gt;
    &lt;phrase rel="mpc:editor" href="http://mypubco.com/empid/53234"/&gt;
  &lt;/bibliomisc&gt;
&lt;/info&gt;</programlisting>

<para xml:id="p8">because we have</para>

<programlisting>&lt;info&gt;
  &lt;editor role="mpc:editor"&gt;
    &lt;personname&gt;Some Name&lt;/personname&gt;
    &lt;uri&gt;http://mypubco.com/empid/53234&lt;/uri&gt;
  &lt;/editor&gt;
&lt;/info&gt;</programlisting>

<para xml:id="p9">I'm not suggesting those are <emphasis>exactly</emphasis> the same,
they're clearly not, but I'm comfortable that existing DocBook elements
are sufficient for the task.</para>

<para xml:id="p10">(Yes, you'd need a DocBook-specific tool to extract the metadata, which
is a disadvantage, but you probably want one anyway for the existing
DocBook semantics.)</para>

<para xml:id="p11">Second, it would allow you to construct statements with conflicting
or, at best, odd semantics:</para>

<programlisting>&lt;section&gt;
  &lt;title property="dc:creator"&gt;Alice1&lt;/title&gt;
  &lt;para xml:id='p12'&gt;This is from section 2.2.&lt;/para&gt;
&lt;/section&gt;</programlisting>

<para xml:id="p13">I can just about imagine a sense in which “Alice1” can be both the
title of a section and the <wikipedia>Dublin Core</wikipedia>
creator of the section, but it doesn't make a lot of sense.</para>
 
<para xml:id="p14">Third, Bob's example seems to suggest that it would encourage
markup like this:</para>

<programlisting>&lt;para about="/alice/posts/trouble_with_bob" xml:id='p15'&gt;
  &lt;phrase property="dc:title"&gt;The trouble with Bob2&lt;/phrase&gt;
  &lt;phrase property="dc:creator"&gt;Alice2&lt;/phrase&gt;
&lt;/para&gt;</programlisting>

<para xml:id="p16">which seems like a bad idea to me.</para>

<para xml:id="p17">On the other hand, some of the examples do seem useful for exactly
the sort of thing I suggested motivated my interest:</para>

<programlisting>&lt;bibliomisc property="mpc:lastScreenShotDate" content="2009-08-01T15:31:00"/&gt;
&lt;bibliomisc property="mpc:softwareRelease"    content="3.1"/&gt;</programlisting>

<para xml:id="p18">In fairness, Bob set out to recreate the triples from the original
tutorial, so some of the markup choices were forced upon him.</para>

<para xml:id="p19">So I'm not sure.</para>

</essay>

