ePUB tools

Volume 13, Issue 22; 09 Jun 2010; last modified 08 Oct 2010

Want to convert your favorite specification to ePUB? Here are the tools that I've been using. [Update: 10 June 2010] Much revised.

Publishing W3C specifications as ePUB files looks to be a popular essay. Before I'm inundated with requests, I figure I better publish my tools so you can do the conversions yourself. They work for me, YMMV, of course.

As things stand right now, if the input spec is W3C pubrules compliant, stored in a single file, and uses the classes “div1”, “div2”, “div3”, etc. for the various section levels in the spec, I think you'll get a nice, neat ePUB file. The more your input diverges from that structure, the harder it will be to make things work.

There are four parts:

spec2epub.xpl

This is the main XProc pipeline. Its job is to download the spec and shepherd everything along. It fiddles a bit with class attributes to massage XProc and a few other specs into the right general structure.

getfiles.pl

One of the things that we need to do is download all the ancillary documents (images, CSS stylesheets, etc.). The pipeline can extract the URIs, but since some of them aren't XML, it can't download them. This script does that part.

(Well, actually, it can download them just fine, what it can't do is write them to disk in their original, binary form. I have an extension to p:store which does this, but I didn't want to use any extensions in this pipeline.)

[Update: 10 June 2010] This script now parses CSS and massages the CSS files to look for embedded url()’s.

spec2epub.xsl

Chunking the spec and creating the ePUB metadata files is handled by an XSLT stylesheet.

For W3C specifications, we can exactly and accurately collect all the metadata (Thank you, Ian and pubrules!). For the OASIS specs, I had to hand edit some of the metadata.

[Update: 10 June 2010] I rewrote this stylesheet substantially. It works a lot harder to preserve the relative structure of the spec so that it can construct relative links to stylesheets, images, and the like. It also does a whole bunch of (ad hoc!) markup cleanup.

webkit2png

The last thing I do is generate a cover image. I do this with a platform-dependent, Mac-only script that converts an HTML page into a PNG. You're on your own for this part, I can't help you. (If someone knows of a portable way to do this, I'd love to hear about it.)

Download the files and run the pipeline with your favorite XProc processor. The pipeline has no inputs and no outputs, only options:

href

The URI of the spec you want to convert. For W3C specs where the URI is often just a pathname, make sure you include the trailing slash or internal URIs won't get resolved correctly.

base

The directory where you want the ePUB structure written.

chunkdepth

By default, specs are physically chunked at second-level sections. For some specs, this doesn't seem quite deep enough. You can explicitly set the depth to 1, 2, or 3. If you want to go deeper than 3, you'll have to edit f:chunk in spec2epub.xsl.

For example:


    $ 
    calabash spec2epub.xpl base=/tmp/xproc/ href=http://www.w3.org/TR/xproc/ chunkdepth=3
  

Good luck! If you make improvements, please share them!

Comments

Hi,

(liked it & ) I tried the conversion: the xpl-file seems to reference a file in your (local) home directory:

"......"

Do you have that file also?

Daniel

—Posted by Daniel Koller on 12 Jun 2010 @ 08:06 UTC #

...solved the first problem regarding tee.xpl (found the code in one of your earlier answer to another earlier comment)

Now I do not understand which command you are referring to when calling " "

I am executing the script on a mac, also webkit2png runs, but the command 'convert' cannot be found: which tool are you referring to here?

Daniel

—Posted by Daniel Koller on 12 Jun 2010 @ 08:41 UTC #