<?xml version="1.0" encoding="UTF-8"?>
<essay xml:lang="en" version="5.0" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:gal="http://norman.walsh.name/rdf/gallery#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<info>
    
    
    
    
    
    
    
    
    
    
    
    
<title>ePUB tools</title><biblioid class="uri">http://norman.walsh.name/2010/06/09/epubxpl</biblioid>
<volumenum>13</volumenum>
<issuenum>22</issuenum>
<pubdate>2010-06-09T06:24:09-04:00</pubdate>
<author>
      <personname>
<firstname>Norman</firstname>
	<surname>Walsh</surname>
</personname>
    </author>
<copyright>
      <year>2010</year>
      <holder>Norman Walsh</holder>
    </copyright>
<abstract>
<para>Want to convert your favorite specification to ePUB? Here are the tools
that I've been using. [Update: 10 June 2010] Much revised.</para>
</abstract>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#W3C"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XML"/>
<dc:subject rdf:resource="http://norman.walsh.name/knows/taxonomy#XProc"/>
</info>

<para xml:id="p1">Publishing W3C specifications
<link xlink:href="../07/epub">as ePUB files</link> looks to be a popular essay.
Before I'm inundated with requests, I figure I
better publish my tools so you can do the conversions yourself. They work for me,
YMMV, of course.</para>

<para xml:id="p2">As things stand right now, if the input spec is W3C
<link xlink:href="http://www.w3.org/2005/07/pubrules">pubrules
compliant</link>, stored in a single file, and uses the classes “div1”,
“div2”, “div3”, etc. for the various section levels in the spec, I
think you'll get a nice, neat ePUB file. The more your input diverges from
that structure, the harder it will be to make things work.</para>

<para xml:id="p3">There are four parts:</para>

<variablelist>
<varlistentry>
      <term xlink:href="examples/spec2epub.xpl">spec2epub.xpl</term>
<listitem>
<para xml:id="p4">This is the main
<wikipedia page="XML_pipeline">XProc</wikipedia>
pipeline. Its job is to download the spec and shepherd
everything along. It fiddles a bit with class attributes to massage XProc and a few
other specs into the right general structure.
</para>
</listitem>
</varlistentry>
<varlistentry>
      <term xlink:href="examples/getfiles.pl">getfiles.pl</term>
<listitem>
<para xml:id="p5">One of the things that we need to do is download all
the ancillary documents (images, CSS stylesheets, etc.). The pipeline
can extract the URIs, but since some of them aren't XML, it can't
download them. This script does that part.</para>
<para xml:id="p6">(Well, actually, it can download them just fine,
what it can't do is write them to disk in their original, binary form.
I have an extension to <tag>p:store</tag> which does this, but I
didn't want to use any extensions in this pipeline.)
</para>
<para xml:id="p16">[Update: 10 June 2010] This script now parses CSS and massages
the CSS files to look for embedded <literal>url()</literal>’s.</para>
</listitem>
</varlistentry>
<varlistentry>
      <term xlink:href="examples/spec2epub.xsl">spec2epub.xsl</term>
<listitem>
<para xml:id="p7">Chunking the spec and creating the ePUB metadata
files is handled by an XSLT stylesheet.</para>
<para xml:id="p8">For W3C specifications, we can exactly and
accurately collect all the metadata (Thank you,
<personname>
	    <firstname>Ian</firstname>
	    <surname role="suppress">Jacobs</surname> </personname> and pubrules!). For the
OASIS specs, I had to hand edit some of the metadata.</para>
<para xml:id="p17">[Update: 10 June 2010] I rewrote this stylesheet substantially. It works
a lot harder to preserve the relative structure of the spec so that it can
construct relative links to stylesheets, images, and the like. It also does
a whole bunch of (ad hoc!) markup cleanup.</para>
</listitem>
</varlistentry>
<varlistentry>
      <term xlink:href="http://www.paranoidfish.org/projects/webkit2png">webkit2png</term>
<listitem>
<para xml:id="p9">The last thing I do is generate a cover image. I do
this with a platform-dependent, Mac-only script that converts an HTML
page into a PNG. You're on your own for this part, I can't help you.
(If someone knows of a portable way to do this, I'd love to hear about
it.)
</para>
</listitem>
</varlistentry>
</variablelist>

<para xml:id="p10">Download the files and run the pipeline with your
favorite XProc processor. The pipeline has no inputs and no outputs,
only options:</para>

<variablelist>
<varlistentry>
      <term>
	<option>href</option>
      </term>
<listitem>
<para xml:id="p11">The URI of the spec you want to convert. For W3C
specs where the URI is often just a pathname, make sure you include
the trailing slash or internal URIs won't get resolved correctly.
</para>
</listitem>
</varlistentry>
<varlistentry>
      <term>
	<option>base</option>
      </term>
<listitem>
<para xml:id="p12">The directory where you want the ePUB structure written.
</para>
</listitem>
</varlistentry>
<varlistentry>
      <term>
	<option>chunkdepth</option>
      </term>
<listitem>
<para xml:id="p13">By default, specs are physically chunked at
second-level sections. For some specs, this doesn't seem quite deep
enough. You can explicitly set the depth to 1, 2, or 3. If you want to
go deeper than 3, you'll have to edit <function>f:chunk</function> in
<filename>spec2epub.xsl</filename>.</para>
</listitem>
</varlistentry>
</variablelist>

<para xml:id="p15">For example:</para>

<screen>
    <prompt>$ </prompt>
    <userinput>calabash spec2epub.xpl base=/tmp/xproc/ href=http://www.w3.org/TR/xproc/ chunkdepth=3</userinput>
  </screen>

<para xml:id="p14">Good luck! If you make improvements, please share them!</para>

</essay>

