A critique of cgi.escape

Mon Sep 25 23:02:18 EDT 2006

In message <Xns984996E6BABCEduncanbooth at 127.0.0.1>, Duncan Booth wrote:

> If I have a unicode string such as: u'\u201d' (right double quote), then I
> want that encoded in my html as '”' (or ” but the numeric form
> is better).

Right-double-quote is not an HTML special, so there's no need to quote it.
I'm only concerned here with characters that have special meanings in HTML
markup.

> There should be a one-stop shop where I can take my unicode text and
> convert it into something I can safely insert into a generated html page;
> at present I need to call both cgi.escape and s.encode to get the desired
> effect.

What you're really asking for is a version of cgi.escape that a) fixes the
bugs discussed in this thread, and b) copes with different encodings while
doing so.

To handle b), you would need to pass it some indication of what the encoding
of the string is. In any case, converting a literal right-double-quote to
” is not relevant to the purpose of cgi.escape.