A critique of cgi.escape

Tue Sep 26 03:00:10 EDT 2006

Lawrence D'Oliveiro <ldo at geek-central.gen.new_zealand> wrote:

> In message <Xns984996E6BABCEduncanbooth at 127.0.0.1>, Duncan Booth
> wrote: 
> 
>> If I have a unicode string such as: u'\u201d' (right double quote),
>> then I want that encoded in my html as '”' (or ” but the
>> numeric form is better).
> 
> Right-double-quote is not an HTML special, so there's no need to quote
> it. I'm only concerned here with characters that have special meanings
> in HTML markup.

There is no need to quote " or ' either except in particular situations.

Would you care to suggest how you get a right double quote into any iso-
8859-1 encoded web page without quoting it? Even if the page is utf-8 
encoded quoting it can be a good idea.

> 
>> There should be a one-stop shop where I can take my unicode text and
>> convert it into something I can safely insert into a generated html
>> page; at present I need to call both cgi.escape and s.encode to get
>> the desired effect.
> 
> What you're really asking for is a version of cgi.escape that a) fixes
> the bugs discussed in this thread, and b) copes with different
> encodings while doing so.
> 
> To handle b), you would need to pass it some indication of what the
> encoding of the string is. In any case, converting a literal
> right-double-quote to ” is not relevant to the purpose of
> cgi.escape. 
> 
You don't seem to understand about html entity escapes. ” is a valid 
way to express right double quote whatever the page encoding. There is no 
need to know the encoding of the page in order to escape entities, just 
escape anything which can be problematic.