A critique of cgi.escape

Mon Sep 25 11:49:52 EDT 2006

In article <1159196923.577648.192520 at i42g2000cwa.googlegroups.com>, and-google at doxdesk.com wrote:
>> I'm sorry, that's not good enough. How, precisely, would it break
>> "existing code"?
> 
> ('owdo Mr. Ribbens!)

Good afternoon Mr Glover ;-)

>     URI= 'http://www.oreilly.com/'
>     html= cgi.escape(text)
>     html= html.replace('O\'Reilly', '<a href="%s">O\'Reilly</a>' % URI)
> 
> Sure this may be rare, but it's what the documentation says, and
> changing it may not only fix things but also subtly break things in
> ways that are hard to detect.

I'm not sure about "subtly break things", but you're right that the
above code would break. I could argue that it's broken already,
(since it's doing a plain-text search on HTML data) but given
real-world considerations it's reasonable enough that I won't be that
pedantic ;-)

> I personally think the entire function should be deprecated, firstly
> because it's insufficient in some corner cases (apostrophes as you
> pointed out, and XHTML CDATA), and secondly because it's in the wrong
> place: HTML-escaping is nothing to do with the CGI interface. A good
> template library should deal with escaping more smoothly and correctly
> than cgi.escape. (It may be able to deal with escape-or-not-bother and
> character encoding issues automatically, for example.)

I agree that in most situations you should probably be using a
template library, but sometimes a simple CGI-and-manual-HTML system
suffices, and I think (a fixed version of) cgi.escape should exist at
a low level of the web application stack.