ePUB, second attempt

Volume 13, Issue 23; 10 Jun 2010; last modified 08 Oct 2010

Playing with ePUB. Validated ePUB this time.

My initial forays into ePUB-land were pretty lame, at least in the sense that what I produced bore only a passing resemblance to valid ePUB. (Ironic considering my general affinity for validation.)

I've updated the originals, so now they're valid according to epubcheck.

In addition to getting a bunch of mechanics right, doing some more agressive parsing, and making sure I had (and only referred to) local copies of things like CSS files, it meant throwing out a fair bit of markup.

The subset of XHTML that ePUB mandates is fairly narrow: “id” attributes, not “name” attributes, no color, alignment, or width indicators in tables, no “clear” attribute on brs, no type attributes on lists, no fonts, etc.

I don't actually object to any of those constraints, though some of them seem a little strange given that I'd expect most ePUB readers are built on top of existing HTML toolkits. (But maybe that's not the case, or wasn't historically the case.) In fairness, the Kindle can't even render a simple bulleted list correctly so perhaps I'm overestimating the situation.

Limited markup choices are ok if you're starting from a vocabulary like DocBook, you can simply plan your transformation accordingly. But for the W3C/OASIS specifications, I started with HTML; it seems a bit risky to throw away markup that might be relevant.

On top of that, my ePUB reader is a web browser, it can handle full HTML just fine. So I also converted the specifications without doing all that markup cleanup. If they work for you, you might prefer them. Here they are:

RELAX NG Compact Syntax
RELAX NG DTD Compatibility
Guidelines for using W3C XML Schema Datatypes in with RELAX NG
RELAX NG
XForms 1.1
XInclude
XLink 1.1
XML Catalogs
XML Namespaces
XML Namespaces 1.1
XML
XML 1.1
XML Base
XML Schema Part 0
XML Schema Part 1
XML Schema Part 2
XML Schema Part 1, chunked on third level sections
XML Schema Part 2, chunked on third level sections
XPath/XQuery Data Model
XPath/XQuery Full Text
XPath/XQuery Functions
XPath 2.0
XProc
XProc, chunked on third level sections
XPath/XQuery Formal Semantics
XQuery
XQueryX
XSL FO 1.1
XSLT/XQuery Serialization
XSLT 2.0
XSLT 2.1

These look great in my reader, which I can't wait to show you!

Comments

Norm, I downloaded, as a test, the XML 1.1 document on my Mac using Adobe's digital edition, and also on my iPad using iBook. It reads really really well.

What is involved if I want to install a similar pipeline on my machine?

Thanks!

Ivan

Ivan - if you haven't figured this out already - there is a ton of ways to compile an epub. Basically it's a bunch of HTML files zipped up plus: OPF file: a manifest, declaring specifying MIME types of files a reading order (a spine) NCX file: a hyperlinked table of contents

For a good GUI tool I recommend Calibre and similar.

if you want control, clean markup etc then just produce valid XHTML files and bundle them together using one of the plethora of tools. At one point in time I have used a ruby gem for this, it was buggy, but worked well enough for me. Unfortunately there's a few things in OPF + zipping up that can't be done in pure XSL, otherwise I would have written an XSL that compiles an epub a long time ago...