[Web-SIG] Should PEP 3333 be Python 3-only? What about transcoding?
P.J. Eby
pje at telecommunity.com
Thu Nov 4 00:19:34 CET 2010
As I've been tidying up wsgiref in the stdlib for PEP 3333, I've been
noticing that there's a bit of an issue with the PEP as far as CGI variables.
Currently, the CGI example is the same as it is in PEP 3333, which
means that it's correct code for Python 2.x, but wrong for 3.x due to
the environment transcoding issue. (See
http://bugs.python.org/issue10155 for details.)
There are other code sample differences, too. In effect, PEP 3333 is
still using Python 2 code samples, because it's trying to cover every
version of Python from 2.1 through 3.2.
Should we ditch that, and say, "hey, if you want Python 2.x code
samples, go see PEP 333?"
That will simplify a couple of things, but still won't address the
transcoding issue.
Specifically, the problem is that on Python 3, os.environ contains
*unicode*, not bytes masquerading as unicode. Unfortunately, this
means that it very possibly contains garbage for CGI variables, as
the web server puts bytes in the environment, then Python converts
those bytes to unicode using the system encoding + surrogateescape.
To get back to bytes, then, we have to decode using the same
combination, then re-encode with latin-1 to get back to a
WSGI-compatible string.
The hitch is this: not everything in os.environ comes from an HTTP
request, and therefore may not be decodable in such a fashion. For
example, if you decode TMP or HOME or even DOCUMENT_ROOT that way,
you're going to get rubbish.
In wsgiref for the stdlib, I've used a variation of And Clover's
patch in issue #10155 to implement something that *only* transcodes
CGI variables that come from the web client request, but it's
dreadfully complex.
This isn't really a problem in wsgiref, because as far as I know,
nobody else has bothered to make another CGI WSGI runner besides the
one in wsgiref, and the sample in the PEP.
But it is a problem for the PEP, because the complexity involved is
high -- so high it would completely obscure the essential simplicity
of the CGI example, if it was written in-line.
There are many possible ways to address this, but my current leaning is to:
1. Change the PEP 3333 code samples to Python 3 only, and
backreference PEP 333 for Python 2 code samples
2. Make the CGI sample in 3333 do an indiscriminate transcode (which
only takes a few lines) and add a note to indicate that a robust CGI
implementation should only do it to CGI variables, suggesting the
wsgiref.handlers.read_environ() code as an example.
Any thoughts?
More information about the Web-SIG
mailing list