A critique of cgi.escape

Mon Sep 25 23:45:23 EDT 2006

In message <4517ec24$0$13947$edfadb0f at dread15.news.tele.dk>, Max M wrote:

> Jon Ribbens skrev:
>> In article <mailman.569.1159192888.10491.python-list at python.org>, Fredrik
>> Lundh wrote:
>>>> There's nothing to say that cgi.escape should take them both into
>>>> account in the one function
>>> so what exactly are you using cgi.escape for in your code ?
>> 
>> To escape characters so that they will be treated as character data
>> and not control characters in HTML.
>> 
>>>> What precisely do you think it would "break"?
>>> existing code, and existing tests.
>> 
>> I'm sorry, that's not good enough. How, precisely, would it break
>> "existing code"? Can you come up with an example, or even an
>> explanation of how it *could* break existing code?
> 
> 
> Some examples are:
> 
> - Possibly any code that tests for string equality in a rendered
> html/xml page.

You've got to be kidding. Any programmer knows that, to test two strings for
equality, you should do that on a canonical (non-encoded) representation.

> - Code that generates cgi.escaped() markup and (rightfully) for some
> reason expects the old behaviour to be used.

Whenever I use a channel-coding function, I expect the resulting output to
be only fit for feeding into the channel. I do NOT expect to do anything
else with it. Any kind of data manipulation I do, I do BEFORE feeding it
into the output channel, which means BEFORE putting it through the channel
coding.

> - 3. party code that parses/scrapes content from cgi.escaped() markup.
> (you could even break Java code this way :-s )

If that code follows the HTML rules, it will work.