[Python-Dev] PEP 3333: wsgi_string() function

Victor Stinner victor.stinner at haypocalc.com
Tue Jan 4 14:33:37 CET 2011


Le mardi 04 janvier 2011 à 13:20 +0100, Antoine Pitrou a écrit :
> On Tue, 04 Jan 2011 03:44:53 +0100
> Victor Stinner <victor.stinner at haypocalc.com> wrote:
> > def wsgi_string(u):
> >     # Convert an environment variable to a WSGI "bytes-as-unicode"
> > string
> >     return u.encode(enc, esc).decode('iso-8859-1')
> > 
> > def run_with_cgi(application):
> >     environ = {k: wsgi_string(v) for k,v in os.environ.items()}
> >     environ['wsgi.input']        = sys.stdin
> >     environ['wsgi.errors']       = sys.stderr
> >     environ['wsgi.version']      = (1, 0)
> > ...
> > --------------
> > 
> > What is this horrible encoding "bytes-as-unicode"? os.environ is
> > supposed to be correctly decoded and contain valid unicode characters.
> > If WSGI uses another encoding than the locale encoding (which is a bad
> > idea), it should use os.environb and decodes keys and values using its
> > own encoding.
> > 
> > If you really want to store bytes in unicode, str is not the right type:
> > use the bytes type and use os.environb instead.
> 
> +1. We should minimize such reencoding dances, and avoid promoting them.

The example from the PEP is specific to CGI and is a little bit special.

The reference implementation (wsgiref in py3k) only redecodes
("transcode") some variables:
---
_is_request = {
    'SCRIPT_NAME', 'PATH_INFO', 'QUERY_STRING', 'REQUEST_METHOD',
'AUTH_TYPE',
    'CONTENT_TYPE', 'CONTENT_LENGTH', 'HTTPS', 'REMOTE_USER',
'REMOTE_IDENT',
}.__contains__

def _needs_transcode(k):
    return _is_request(k) or k.startswith('HTTP_') or
k.startswith('SSL_') \
        or (k.startswith('REDIRECT_') and _needs_transcode(k[9:]))
---

My problem is that I don't understand how I can know if a variable was
converted to "bytes-as-unicode" or not. GrahamDumpleton told me on IRC,
that the framework is supposed to redecodes one more time some variables
(eg. PATH_INFO). But this is not explicit in the PEP and
_needs_transcode() is a private function.

Since the environ already contain different types (eg. wsgi.version is a
tuple, wsgi.multithread is a boolean, ...), why not keeping these
variables as raw bytes?

Victor



More information about the Python-Dev mailing list