A critique of cgi.escape

Tue Sep 26 00:43:24 EDT 2006

On Mon, 25 Sep 2006 16:48:03 +0200, Max M wrote:

> Any change in Python that has these consequences will rightfully be 
> considered a bug. So what you are suggesting is to knowingly introduce a 
> bug in the standard library!

It isn't like there have never been backwards _in_compatible changes to
the standard library before.

Ten seconds of googling finds
http://www.python.org/download/releases/2.3/highlights/:

    int() - this can now return a long when converting a string with many
    digits, rather than raising OverflowError. (New in 2.3a2: issues a
    FutureWarning when sign-folding an unsigned hex or octal literal.)

    Bastion and rexec - these modules are disabled, because they aren't
    safe in Python 2.3 (nor in Python 2.2). (New in 2.3a2.)

    Hex/oct literals prefixed with a minus sign were handled
    inconsistently. This has been fixed in accordance with PEP 237. (New
    in 2.3a2.)

    Passing a float to C functions expecting an integer now issues a
    DeprecationWarning; in the future this will become a TypeError. (New
    in 2.3a2.) 

    None - assignment to variables or attributes named None will now
    trigger a warning. In the future, None may become a keyword.

And more, all from one release.

If the behaviour of cgi.escape is "broken", or incomplete, or misleading,
then Python has a great mechanism for introducing incompatible changes
slowly: warnings.

It isn't good enough to say that the function does what it says it does,
if what it does is dangerous and misleading. Artificial example:

def sqr(x):
    """Returns the square of almost all numbers."""
    if x != 1: return x**2
    else: return -1

The function does exactly what it says, and yet still has badly dangerous
behaviour that risks introducing serious bugs. If people are relying on
unit tests which include specific tests for that behaviour, then the
function and the code needs to be fixed in parallel. That's what the
warnings module is for.

So any arguments about "breaking code" are a red herring: if cgi.escape
does the wrong thing (and that's arguable), and code relies on that
behaviour, then the code is already broken and needs to be fixed in
parallel with the function. So can we accept that:

(1) *if* there is a problem with cgi.escape it needs to be fixed;

(and, dear gods, I would hope that nobody here wants to argue that Python
should make backwards compatibility a higher virtue than correctness!)

(2) it doesn't need to be fixed *immediately* without warning;

(3) but it can be fixed through a gradual process with warning; and

(4) unit tests and code that expect the (presumed) bad behaviour can be
fixed gradually?

Now that we've got that out of the way, can we CALMLY and RATIONALLY
discuss whether cgi.escape is or isn't broken?

Or, more specifically, UNDER WHAT CIRCUMSTANCES it does the wrong thing?

-- 
Steven D'Aprano