A critique of cgi.escape

Mon Sep 25 11:08:43 EDT 2006

Jon Ribbens wrote:

> I'm sorry, that's not good enough. How, precisely, would it break
> "existing code"?

('owdo Mr. Ribbens!)

It's possible there could be software that relies on ' not being
escaped, for example:

    # Auto-markup links to O'Reilly, everyone's favourite
    # example name with an apostrophe in it
    #
    URI= 'http://www.oreilly.com/'
    html= cgi.escape(text)
    html= html.replace('O\'Reilly', '<a href="%s">O\'Reilly</a>' % URI)

Sure this may be rare, but it's what the documentation says, and
changing it may not only fix things but also subtly break things in
ways that are hard to detect.

A similar change to str.encode('unicode-escape') in Python 2.5 caused a
number of similar subtle problems. (In this case the old documentation
was a bit woolly so didn't prescribe the exact older behaviour.)

I'm not saying that the cgi.escape interface is *good*, just that it's
too late to change it.

I personally think the entire function should be deprecated, firstly
because it's insufficient in some corner cases (apostrophes as you
pointed out, and XHTML CDATA), and secondly because it's in the wrong
place: HTML-escaping is nothing to do with the CGI interface. A good
template library should deal with escaping more smoothly and correctly
than cgi.escape. (It may be able to deal with escape-or-not-bother and
character encoding issues automatically, for example.)

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/