[Web-SIG] Draft PEP: WSGI 1.1

Thu Apr 15 17:30:59 CEST 2010

Dirkjan Ochtman wrote:

> 1. The application is passed an instance of a Python dictionary
>    containing what is referred to as the WSGI environment. All keys
>    in this dictionary are native strings. For CGI variables, all names
>    are going to be ISO-8859-1 and so where native strings are
>    unicode strings, that encoding is used for the names of CGI
>    variables.

Perhaps explain where those ISO-8859-1 bytes might come from:

     ...are native strings. Where native strings are Unicode, any
     keys derived from byte-oriented sources (such as custom headers
     in the HTTP request reflected in the CGI environment variables)
     should be decoded using the ISO-8859-1 encoding.

> 3. For the CGI variables contained in the WSGI environment, the values
>    of the variables are native strings. Where native strings are
>    unicode strings, ISO-8859-1 encoding would be used such that the
>    original character data is preserved and as necessary the unicode
>    string can be converted back to bytes and thence decoded to unicode
>    again using a different encoding.

Good. The only problem that remains with this is that in certain 
environments (notably: all IIS use, not just CGI) a WSGI gateway cannot 
fully comply with this requirement.

a. disallow environments that cannot be sure they are preserving the 
original byte data from declaring that they support wsgi.version 1.1?

b. add an extra wsgi.something flag for a WSGI server to add, to specify 
that it is sure that the original bytes have been preserved? (ie. so 
wsgiref's CGI handler would have to declare it wasn't sure when running 
under Windows.)

c. just let WSGI gateways silently ignore the ISO-8859-1 requirement if 
they can't honour it and let the application spend its time trying to 
unravel the mess (status quo).

(Can wsgiref be fixed to use ISO-8859-1 in time for Python 3.2?)

> 7. The iterable returned by the application and from which response
>    content is derived, should yield byte strings. Where native strings
>    are unicode strings, the native string type can also be returned in
>    which case it would be encoded as ISO-8859-1.

> 8. The value passed to the 'write()' callback returned by
>    'start_response()' should be a byte string. Where native strings
>    are unicode strings, a native string type can also be supplied, in
>    which case it would be encoded as ISO-8859-1.

Weren't we going to only allow US-ASCII for the output? (These threads 
are always so far apart I can never remember what conclusion we 
reached... if any.)

Whilst ISO-8859-1 is in the HTTP standard for headers, and required to 
preserve bytes in input, it's much, much less likely that the response 
body is going to be ISO-8859-1. It could maybe be cp1252, but more 
likely the author wanted UTF-8.

If we must support Unicode strings for response body output at all, I'd 
prefer to be conservative here and spit a UnicodeEncodeError straight 
away, rather than quietly mangle characters U+0080 to U+00FF.

Manlio Perillo wrote:

> The run_with_cgi sample function should be changed, since it probably
> does not work correctly, on Python 3.x.

Yes, the 'URL Reconstruction' fragment will be wrong too, since it uses 
urllib.quote() to encode the path part. quote() defaults to UTF-8 rather 
than the ISO-8859-1 WSGI 1.1 requires.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/