A critique of cgi.escape

Jon Ribbens jon+usenet at unequivocal.co.uk
Mon Sep 25 10:50:32 EDT 2006


In article <Xns98499CF9DCEE4duncanbooth at 127.0.0.1>, Duncan Booth wrote:
> I guess you've never seen anyone write tests which retrieve some generated 
> html and compare it against the expected value. If the page contains any 
> unescaped quotes then this change would break it.

You're right - I've never seen anyone do such a thing. It sounds like
a highly dubious and very fragile sort of test to me, of very limited
use.

> I'm talking about encoding certain characters as entity references. It 
> doesn't matter whether its the character ampersand or right double quote, 
> they both want to be converted to entities. Same operation.

This is that muddled thinking I was talking about. They are *not* the
same operation. You want to encode "<", for example, because it must
always be encoded to prevent it being treated as an HTML control
character. This has nothing to do with character encodings.

You might sometimes want to escape "right double quote" because it may
or may not be available in the character encoding you using to output
to the browser. Yes, this might sometimes seem a bit similar to the
"<" escaping described above, because one of the ways you could avoid
the character encoding issue would be to use numeric entities, but it
is actually a completely separate issue and is none of the business of
cgi.escape.

By your argument, cgi.escape should in fact escape *every single*
character as a numeric entity, and even that wouldn't work properly
since "&", "#", ";" and the digits might not be in their usual
positions in the output encoding.

> Right now the only way the Python library gives me to do the entity
> escaping properly has a side effect of encoding the string. I should
> be able to do the escaping without having to encode the string at
> the same time.

I'm getting lost here - the opposite of what you say above is true.
cgi.escape does the escaping properly (modulo failing to escape
quotes) without encoding.



More information about the Python-list mailing list