ePUB, second attempt

Volume 13, Issue 23; 10 Jun 2010; last modified 08 Oct 2010

Playing with ePUB. Validated ePUB this time.

My initial forays into ePUB-land were pretty lame, at least in the sense that what I produced bore only a passing resemblance to valid ePUB. (Ironic considering my general affinity for validation.)

I've updated the originals, so now they're valid according to epubcheck.

In addition to getting a bunch of mechanics right, doing some more agressive parsing, and making sure I had (and only referred to) local copies of things like CSS files, it meant throwing out a fair bit of markup.

The subset of XHTML that ePUB mandates is fairly narrow: “id” attributes, not “name” attributes, no color, alignment, or width indicators in tables, no “clear” attribute on brs, no type attributes on lists, no fonts, etc.

I don't actually object to any of those constraints, though some of them seem a little strange given that I'd expect most ePUB readers are built on top of existing HTML toolkits. (But maybe that's not the case, or wasn't historically the case.) In fairness, the Kindle can't even render a simple bulleted list correctly so perhaps I'm overestimating the situation.

Limited markup choices are ok if you're starting from a vocabulary like DocBook, you can simply plan your transformation accordingly. But for the W3C/OASIS specifications, I started with HTML; it seems a bit risky to throw away markup that might be relevant.

On top of that, my ePUB reader is a web browser, it can handle full HTML just fine. So I also converted the specifications without doing all that markup cleanup. If they work for you, you might prefer them. Here they are:

These look great in my reader, which I can't wait to show you!


Norm, I downloaded, as a test, the XML 1.1 document on my Mac using Adobe's digital edition, and also on my iPad using iBook. It reads really really well.

What is involved if I want to install a similar pipeline on my machine?



—Posted by Ivan Herman on 08 Sep 2010 @ 04:49 UTC #

Ivan - if you haven't figured this out already - there is a ton of ways to compile an epub. Basically it's a bunch of HTML files zipped up plus: OPF file: a manifest, declaring specifying MIME types of files a reading order (a spine) NCX file: a hyperlinked table of contents

For a good GUI tool I recommend Calibre and similar.

if you want control, clean markup etc then just produce valid XHTML files and bundle them together using one of the plethora of tools. At one point in time I have used a ruby gem for this, it was buggy, but worked well enough for me. Unfortunately there's a few things in OPF + zipping up that can't be done in pure XSL, otherwise I would have written an XSL that compiles an epub a long time ago...

—Posted by Lech Rzedzicki on 11 Oct 2010 @ 04:33 UTC #