Public Identifier Transcription in RFC 3151
Why is the single quote character
('
) escaped as “%27
”?
An expert is a person who has made all the mistakes that can be made in a very narrow field.
Back in 2001, Paul Grosso, John Cowan, and I wrote RFC 3151, A URN Namespace for Public Identifiers, in order to preserve public identifiers in a URI-only world. It fell silently into the ocean of specifications and vanished, leaving only the faintest ripple.
Then, out of the blue a few weeks ago, came a series of probing questions about finicky details of the character transcription rules for public identifiers. Apparently, someone used RFC 3151 in an assignment for a university course and a bunch of students read it really carefully.
Why, the students asked, is the single quote character
('
) escaped as “%27
”?
Correctly, they point out, neither RFC 2141 nor RFC 2396 require it to be
escaped.
The answer: we goofed. It shouldn’t have to be escaped. When we were writing the specification, John, Paul, and I (mostly John) constructed a big table, showing each character and what each specification said about how it had to be represented. Somewhere along the way, we got it into our heads that the single quote had to be escaped. We were wrong, but we never noticed.
Luckily, the error is harmless. Section 2.3 of RFC 2396 says specifically that escaping a single quote does not change its semantics.
I suppose if the publicid
URN scheme had been
wildly successful, we might have updated the RFC to correct this
error. As it is, I think the prudent thing to do is just live with it.
(To conform to RFC 3151, you do have to escape them.)
The special rules for semicolon (;
), colon
(:
) and plus (+
)
are necessary because RFC 3151 uses them to
represent syntactically significant pieces of the original public
identifier.
Anyway, hopefully Google will lead the next group of curious students to this answer.