[Web-SIG] Request for Comments on upcoming WSGI Changes
Robert Brewer
fumanchu at aminus.org
Mon Sep 21 17:18:12 CEST 2009
And Clover wrote:
> > A middleware might re-decode the values if the `wsgi.uri_encoding`
> > is `iso-8859-1` and only then.
>
> Seems like a mistake. If the middleware knows iso-8859-7 is in use, it
> would need to transcode the charset regardless of whether the
> initially-submitted bytes were a valid UTF-8 sequence or not.
Otherwise
> the application would break when fed with eg. Greek words that
happened
> to encode to valid UTF-8 bytes.
If the entire site expects iso-8859-7 Request-URL's then the deployer
should tell the WSGI server to decode using iso-8859-7 instead of utf-8.
If only part of the site expects iso-8859-7 then...yeah, it needs to
transcode. So what?
> > The application MUST use this value to decode the ``'QUERY_STRING'``
> > as well.
>
> This will break all use of non-UTF-8 encodings in QUERY_STRING, where
> the path part of the URL does not contain non-UTF-8 sequences. That
> includes the very common case where the path part contains only ASCII.
>
> http://greek.example.com/myscript.cgi?x=%C2
>
> will fail, as the given UTF-8 sniffer only looks at the path part to
> determine what encoding to use for both of the path part and the query
> string.
No, it won't fail. WSGI servers do not perform %-decoding of the
QUERY_STRING. In the example given, a WSGI 1.1 server will set the
Python 3 environ values:
{'SCRIPT_NAME': '',
'PATH_INFO': 'myscript.cgi',
'QUERY_STRING': 'x=%C2'}
Robert Brewer
fumanchu at aminus.org
More information about the Web-SIG
mailing list