<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet href="/style/browser.xsl" type="text/xsl"?>
<essay xmlns="http://docbook.org/ns/docbook"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
       xmlns:dc='http://purl.org/dc/elements/1.1/'
       xmlns:dcterms="http://purl.org/dc/terms/"
       xmlns:gal='http://norman.walsh.name/rdf/gallery#'
       version="pto">
<info>
<title>webarch.pdf</title>
<volumenum>7</volumenum>
<issuenum>206</issuenum>
<pubdate>2004-12-07T16:23:09-08:00</pubdate>
<date>$Date: 2005-09-11 10:27:02 -0400 (Sun, 11 Sep 2005) $</date>
<author><personname>
<firstname>Norman</firstname><surname>Walsh</surname>
</personname></author>
<copyright><year>2004</year><holder>Norman Walsh</holder></copyright>
<abstract>
<para>Thoughts on producing quality printed output; specifically, a
nice printed version of Architecture of the World Wide Web.
[Update: added a pointer to the Recommendation PDF.]</para>
</abstract>
</info>

<para xml:id='p1'>[WebArch is now a Recommendation. I've written
<link xlink:href="../15/webarch">a short essay</link> about that which
includes pointers
to a slightly modified stylesheet and to the resulting PDF.]</para>

<para xml:id='p2'>I read a lot of specifications. Most of the time, I read them
online. I know a few folks who assiduously avoid paper all together,
but I am not one of those people. For detailed review of a spec, I
print it out and read it with a red pen in hand.</para>

<sidebar>
<title>Sidebar: Leaving paper behind</title>
<para xml:id='p3'>I'll tell you right now what it would take to get me to leave
paper behind: a Firefox extension that could do a reasonable job of
annotation, even when I'm offline. Let me click and type my annotations. Let
me save them. Let me view and edit them when I come back later. And, of course,
let me easily send them back to the document author.</para>
<para xml:id='p4'>I think that's probably within the realm of possibility for
Firefox. And I'll go one better: make it work cross-platform, make
it open source, make the annotation reader free, so the author doesn't
have to pay to read my annotations, and I'll pay money for the annotation
editing tool.</para>
</sidebar>

<para xml:id='p5'>This brings me to an obvious point, one I hardly need
to make in this crowd: web browsers suck at printing.
Nevermind the fact that some browsers do a better job than others,
they all suck. And CSS is never going to fix it. Did you hear me? CSS
is never going to fix it.
There are lots of programs that can produce more or less nice
looking pages. <application>TeX</application> is an historical
favorite, as is <application>troff</application>. More modern tools
include various desktop publishing packages. In the XML world, the
obvious tool is XSL, the
<link xlink:href="http://www.w3.org/TR/xsl/">Extensible Style Language</link>, not
the <link xlink:href="http://www.w3.org/TR/xslt">Transformation</link>
language.</para>

<para xml:id='p6'>It's important to realize, however, that 
XSL is an incomplete answer. You see, XSL is a
constraint language. In XSL, you can specify how large the pages are,
how many columns they have, the sizes of fonts, and a myriad other
parameters. What you don't specify directly are where the page breaks
necessarily occur, or which words get hyphenated, or where exactly any of the
actual marks are going to wind up on paper.</para>

<para xml:id='p7'>The XSL Formatting Objects (FO) document is input to a
formatter, a composition tool that renders marks on paper, typically
these days
in the form of a PDF file. Producing quality printed output is
devilishly hard. Of all the various sorts of software systems I've
encountered, a formatter is hands down the hardest to implement
well.</para>

<para xml:id='p8'>There are several commercial formatters out there that do an
adequate job. There are also a few free formatters that do a someone
less adequate job. I desperately wish the quality of the free formatters
would improve, but see the previous paragraph.</para>

<para xml:id='p9'>So where does all this lead? For a start, it leads to
<citetitle>Architecture of the World Wide Web</citetitle>. As one of
the editors of that document, and as a long time participant in the
design of XSL, I really wanted to be able to render it on paper in a
reasonably professional looking form with XSL.</para>

<para xml:id='p10'>To that end, I crafted an XSLT Stylesheet that would
transform the XHTML of the specification into XSL
<acronym>FO<alt>Formatting Objects</alt></acronym>
so that I could produce PDF with <application>xep</application>.
Herewith a few notes on that process.</para>

<section xml:id="webarch">
<title>Formatting the WebArch document</title>

<para xml:id='p11'>The WebArch document is authored in a dialect of XHTML. I say dialect
because although its original sources are valid XHTML, they aren't quite
the same XHTML that gets presented in the final specification. A series
of transformations are applied to the sources. In order to produce the
PDF, I decided to start with the transformed XHTML version of the specification,
the
<link xlink:href="http://www.w3.org/TR/webarch/">document you view</link>,
not the original sources.</para>

<para xml:id='p12'>In principle, transforming to XSL Formatting Objects is as straight
forward as any other transformation. Starting with XHTML, you can see that
most of the block structures are going to get transformed to
<tag>fo:block</tag>s and most of the inline structures are going to get
transformed to <tag>fo:inline</tag>s.</para>

<para xml:id='p13'>The tricky part is that
<acronym>FO<alt>Formatting Objects</alt></acronym>
documents have a fair bit of preamble at the front. The preamble is where
you tell the formatter the size and shape of each page; you have to create
a template, called a “master”, for each kind of page that will appear
in your document. If you've never thought about composition in these
terms, it may be a little hard to get your head around it.</para>

<section xml:id="page-masters">
<title>Setting up the page masters</title>

<para xml:id='p14'>If you have a book nearby<footnote><para xml:id='p15'>For the pedantic, a
book written in a language presented left-to-right and top-to-bottom.
Books with other orientations or writing directions will likely show
analogous variation, though that is by no means true for all languages
in all writing directions.</para></footnote>, pick it up and flip
through it. While every page is probably different, odds are good that
you will be able to find four different page layouts in the body of
the book. First, left-hand (even numbered or “verso”) pages probably
differ from right-hand (odd numbered or “recto”) pages. Look at the
headers and footers, they are often mirror images of each other with,
for example, page numbers in the outer corners of each page. Close
inspection will probably also reveal that the margin on the “binding
edge” of the page is a little wider than the margin on the other side.
In many books, the first page of each chapter or section is different
from both the left- and right-hand pages, perhaps having different or
absent running headers or footers. The fourth layout style is for
blank pages, if there are any. It is common for all chapters to begin
on an odd page so if a chapter ends on an odd page, then a blank
“even” page is inserted to force the next chapter to also begin on an
odd page. Like the first page of a chapter, the blank page is often
distinguished from other even pages by different or absent headers and
footers. This is also the page that is sometimes annotated “This page
intentionally left blank”.</para>

<para xml:id='p16'>Each of these page layouts is defined by a “master” with a specific
name. After all the individual page masters have been created, you have
to create a page sequence master. In XSL FO terms, a document consists
of one or more page sequences. Each page sequence has a master that
is a collection of individual page masters. For the WebArch document,
there's only one page sequence, but in a book there might be different
sequences for front matter, body, and back matter.</para>

<para xml:id='p17'>For WebArch, the page sequence master defines a master for the
first page, for odd pages, for even pages, and for blank pages<footnote>
<para xml:id='p18'>No, there won't actually be any blank pages, but I defined the
master anyway. It doesn't do any harm.</para></footnote>.</para>

</section>

<section xml:id="static-content">
<title>Setting up the headers and footers</title>

<para xml:id='p19'>We're now almost ready to start generating FO markup for the
document content, but there's one more little hurdle. Every FO page has
five regions, the main body region in the center where the document goes,
and four more regions around the edges for top, bottom, left, and right
material. The top and bottom regions are used for headers and footers.
If you look at the stylesheet, you'll see that each of these regions in
each page master has a name. As soon as we've started a page sequence,
we'll refer to these regions by name and fill in their content. The
formatter will use this “static content” in the appropriate place on
each page. The content is static in the sense that content from the
document doesn't “flow” into it. It can change on a per-page basis,
as we'll see.</para>

<para xml:id='p20'>Without going into a lot of detail here, if you look in the stylesheet,
you'll see that I use tables to format the running headers and footers,
placing the page numbers, for example, on the left side of left pages
and the right side of right pages. Some masters, like the first page,
have empty headers and/or empty footers.</para>

<para xml:id='p21'>At this point we can “apply templates” on the body and our FO
document will come out.</para>
</section>

<section xml:id="of-note">
<title>Also of note</title>

<para xml:id='p22'>Two other parts of the stylesheet are perhaps notable: PDF bookmarks
and the use of markers. Bookmarks will be a standard
feature of XSL 1.1, but for the moment, I'm relying on a 
<application>xep</application> extension. Markers are more interesting.</para>

<para xml:id='p23'>Markers provide a mechanism for adjusting the running headers
and footers as you progress through a document. Think of the way that
headers and footers change as you flip through a dictionary: markers let
you do that.</para>

<para xml:id='p24'>For WebArch, I decided to put the current first- or second-level
section in the footer of each page. That way you can tell just where
you are. It may prove to be more distracting than useful, but I figured
there'd be no way to tell without trying it.</para>

<para xml:id='p25'>Markers are easy to use. Whenever you output content that should
appear in a header or footer, you output an <tag>fo:marker</tag>.
In the static content for the appropriate header or footer, you use
<tag>fo:retreive-marker</tag>. The formatter will replace the
<tag>fo:retreive-marker</tag> with the appropriate <tag>fo:marker</tag>.
</para>

<para xml:id='p26'>Careful inspection of the stylesheet will reveal several places
where I've taken care to avoid an obvious compositional faux pas like
leaving a single list item on the top or bottom of a page or allowing
a page break to occur immediately after a section title.</para>

<para xml:id='p27'>Readers with design skill could certainly improve the presentation.
</para>

</section>
</section>

<section xml:id="bits">
<title>Getting the bits</title>

<para xml:id='p28'>If you want to play with it, you can get both
<link xlink:href="http://www.w3.org/TR/webarch/">the document</link> and
<link xlink:href="http://www.w3.org/2001/tag/webarch/html2fo.xsl">the stylesheet</link>
from the W3C site. I've also got 
<link xlink:href="examples/html2fo.xsl">a local copy</link> of the stylesheet.</para>

<para xml:id='p29'>Note that it's designed to format exactly the WebArch document, it
is not a general-purpose HTML stylesheet. But you might be able to turn it
into one by adding the appropriate templates. I've only provided templates
for exactly the XHTML elements used in WebArch.</para>

<para xml:id='p30'>If you want to see the results on A4 paper, you can simply set the
<parameter>paper.type</parameter> parameter to “A4”.
The page master markup is a simplified copy of the markup from the
DocBook stylesheets. I've preserved many, but in the interest of simplicity,
not all of the parameters.</para>

<para xml:id='p31'>Share and enjoy.</para>
</section>
</essay>
