Emacs, XML, Unicode
Inserting Unicode characters into emacs. Input methods like this greatly reduce the need for entity declarations, the last remaining holdouts from my life with DTDs.
Some people do their laundry in emacs, but I find typing ^C-^X-^W-q-L-TT to add the fabric softener to be a bit cumbersome.
Tim Bray followed up (in a sense) to my essay on moving beyond DTDs with some nifty emacs code for inserting special Unicode characters directly into a buffer.
Input methods like this greatly reduce the need for entity declarations, the last remaining holdouts from my life with DTDs. Who needs:
<!DOCTYPE article [
<!ENTITY euro "€">
]>
And the corresponding “€”, if you can just stuff a “€” into your buffer!
I read Tim’s essay and decided that he was right about some things, like “smart quotes,” but he didn’t do it quite the way I would have. So I banged away for a bit and coded up XML Characters, my own solution.
XML Characters provides four functions:
-
smart-double-quote
-
This function, which I bind to
"
innxml-mode
, inserts the appropriate double quote. Called after a space, newline, or >, it inserts a left double quote. Called after a double quote, it cycles through the three possible quote styles: left, straight, or right. Called anywhere else, it inserts a right double quote.Inside a start tag, it always inserts just a vanilla ".
-
smart-single-quote
-
I bind this to
'
innxml-mode
and it does just what you think it does. -
insert-xml-char
-
This function inserts a named XML character. For example,
(insert-xml-char "sect")
inserts a section mark (§). The set of names is maintained in a couple of associative lists, so you can easily tweak them. Called with no arguments, it pops up a menu, somewhat like Tim’s code.I bind this to
C-t c
because I have a pretty extensive Ctrl-T map that I’m used to using. -
shortcut-xml-char
-
Where Tim seems content to have a selection of accented characters in a menu, I decided I wanted more complete and uniform access to all the ISO Latin 1 accented characters (plus a few other things; there’s another list, so you’re free to tweak).
For my function, I chose to read two more keystrokes and compose the approprate character that way. I bind this to
C-t e
at the moment.So, for example, I can type
C-t e e '
to insert “e acute”. OrC-t e $ y
to insert a yen symbol.
Thanks, Tim! I didn’t know how much I needed these functions before I wrote them. In the course of writing one essay, I’ve decided I wouldn’t want to live without them.
Comments
Thanks! This is incredibly useful.
One problem, though: When I try to use the insert-xml-char function, Emacs segfaults. Any idea on why?
But entities aren't just useful for inserting characters outside the US-ASCII range. A document might for example contain the version of the software tool it desribes; with entities I simply change one entity declaration and get the new version number in all the fifty places (search and replace could go wrong and confuse version numbers of other tools mentioned). It would be great to have some mechansim for declaring simple constants in XML without having to use DTD doctype declarations with internal subsets.
Tobi: XInclude.
http://www.w3.org/TR/xinclude/
I need to do more work on it, but I've been using a separate step for character entity replacement prior to validation.
http://simonstl.com/projects/ents/
Ents uses an XML file which lists all the names and values of character entities, and I run it as a pre-processor before parsing or just directly on docs at the command line. Defining new entities is pretty simple, and one of the nicer aspects of this is that I can run it backwards to produce the entity-fied version instead of char ref version.
But then, since I don't use Emacs, I have to resort to such hackery, right?
I'm sorry Mike, I can't imagine why it segfaults. Send me your particulars in email and I'll see if I can help.