Numbered program listings

Volume 15, Issue 21; 19 May 2012

Putting line numbers in program listings is harder than it looks.

At first glance, adding line numbers to program listings doesn't seem too difficult: take the text of the listing, break it into a sequence of lines at each newline character, and number them.

Except, at least in DocBook, program listings can contain markup:

This is the first line.
This is the second line. See the call out?
This is the third line.
This is the fourt line.
Etc.

Things to observe:

  1. Lines can contain callouts.

  2. Lines can contain line annotations

  3. Markup, such as the emphasis that begins at “third line” can cross line boundaries.

  4. Although not shown, inlinemediaobject can add graphical elements to a program listing.

The DocBook stylesheets work fairly hard to unwind all the markup and construct discrete lines. For example, elements that cross line boundaries are ended before the newline and restarted after. This allows embedded line numbers and separators and such to be styled consistently.

The problem with the program listing above is that you can't cut-and-paste it from the browser. If you do, you'll get all the line numbers mixed in with the content.

It's been suggested that the ability to cut-and-paste should be a “hard requirement” for HTML rendering of numbered program listings.

Yeah. Well.

As far as I can tell, the way this works on other sites is by using a two column, single row table. You put the line numbers in the first cell and the listing in the second cell. Cut-and-paste then captures only what's in the “content” cell, not the numbers.

That would be ok except for one thing: because there can be embedded markup in the program listing, the relative height of each line can vary some. When it does, the line numbers in the first column get out-of-sync with the data in the second column.

The standard callout graphics are just a smidgen taller than the text, so it's possible to adjust the line height to make them work. But I can't think of anything to do about the more general problem of disparate line heights caused by other markup.

I think this isn't a problem for other sites because they're only displaying plain text. There's no embedded markup so the line heights are always consistent as long as they use the same font in both cells.

It's not at all clear to me what to do for DocBook.

Comments

Well for standard call out http://en.wikipedia.org/wiki/Enclosed_alphanumerics why bother with the graphic in the first place? For the rest I think the line numbers can be reasonably drawn with not too horrifying javascript ala http://ofcode.org/

—Posted by Gavin Carothers on 19 May 2012 @ 10:32 UTC #

" the relative height of each line can vary some. " Visually, if you adjusted the line height to the higher of the two, would that solve the issue without offending anyone?

—Posted by Dave Pawson on 20 May 2012 @ 06:53 UTC #

Yes, it would. But for lines that might contain superscripts or random graphics, it's impossible for the stylesheets to know what measure to use for "the higher of the two".

—Posted by Norman Walsh on 20 May 2012 @ 10:53 UTC #

At least firefox does not include text shown through css content into copy and paste content.

So something like CSS: span.linenumber:before { content: attr(number) ': '; } and an empty span with a class linenumber and an attribute number for the linenumber itself might work.

not sure about other browsers though.

—Posted by Morus Walter on 21 May 2012 @ 10:56 UTC #