Public Identifier Transcription in RFC 3151

Volume 7, Issue 45; 14 Mar 2004; last modified 08 Oct 2010

Why is the single quote character (') escaped as “%27”?

An expert is a person who has made all the mistakes that can be made in a very narrow field.

Niels Bohr

Back in 2001, Paul Grosso, John Cowan, and I wrote RFC 3151, A URN Namespace for Public Identifiers, in order to preserve public identifiers in a URI-only world. It fell silently into the ocean of specifications and vanished, leaving only the faintest ripple.

Then, out of the blue a few weeks ago, came a series of probing questions about finicky details of the character transcription rules for public identifiers. Apparently, someone used RFC 3151 in an assignment for a university course and a bunch of students read it really carefully.

Why, the students asked, is the single quote character (') escaped as “%27”? Correctly, they point out, neither RFC 2141 nor RFC 2396 require it to be escaped.

The answer: we goofed. It shouldn’t have to be escaped. When we were writing the specification, John, Paul, and I (mostly John) constructed a big table, showing each character and what each specification said about how it had to be represented. Somewhere along the way, we got it into our heads that the single quote had to be escaped. We were wrong, but we never noticed.

Luckily, the error is harmless. Section 2.3 of RFC 2396 says specifically that escaping a single quote does not change its semantics.

I suppose if the publicid URN scheme had been wildly successful, we might have updated the RFC to correct this error. As it is, I think the prudent thing to do is just live with it. (To conform to RFC 3151, you do have to escape them.)

The special rules for semicolon (;), colon (:) and plus (+) are necessary because RFC 3151 uses them to represent syntactically significant pieces of the original public identifier.

Anyway, hopefully Google will lead the next group of curious students to this answer.