[Web-SIG] Request for Comments on upcoming WSGI Changes

Robert Brewer fumanchu at aminus.org
Mon Sep 21 17:18:12 CEST 2009


And Clover wrote:
> > A middleware might re-decode the values if the `wsgi.uri_encoding`
> > is `iso-8859-1` and only then.
> 
> Seems like a mistake. If the middleware knows iso-8859-7 is in use, it
> would need to transcode the charset regardless of whether the
> initially-submitted bytes were a valid UTF-8 sequence or not.
Otherwise
> the application would break when fed with eg. Greek words that
happened
> to encode to valid UTF-8 bytes.

If the entire site expects iso-8859-7 Request-URL's then the deployer
should tell the WSGI server to decode using iso-8859-7 instead of utf-8.

If only part of the site expects iso-8859-7 then...yeah, it needs to
transcode. So what?

> > The application MUST use this value to decode the ``'QUERY_STRING'``
> > as well.
> 
> This will break all use of non-UTF-8 encodings in QUERY_STRING, where
> the path part of the URL does not contain non-UTF-8 sequences. That
> includes the very common case where the path part contains only ASCII.
> 
>      http://greek.example.com/myscript.cgi?x=%C2
> 
> will fail, as the given UTF-8 sniffer only looks at the path part to
> determine what encoding to use for both of the path part and the query
> string.

No, it won't fail. WSGI servers do not perform %-decoding of the
QUERY_STRING. In the example given, a WSGI 1.1 server will set the
Python 3 environ values:

{'SCRIPT_NAME': '',
 'PATH_INFO': 'myscript.cgi',
 'QUERY_STRING': 'x=%C2'}


Robert Brewer
fumanchu at aminus.org



More information about the Web-SIG mailing list