[Web-SIG] Request for Comments on upcoming WSGI Changes

And Clover and-py at doxdesk.com
Mon Sep 21 20:15:28 CEST 2009


Armin Ronacher wrote:

> The middleware can never know.

It's much more likely than to know than the server though!

 > WSGI will demand UTF-8 URLs and only
 > provide iso-XXX support for backwards compatibility.

It doesn't sound much like backwards compatibility to me if non-UTF-8 
URLs break as soon as they coincidentally happen to be UTF-8 byte 
sequences. I'm as much an advocate of "UTF-8 for everything everywhere!" 
as anyone else, but unfortunately today there are still dark places 
where you need non-UTF-8 URLs.

Incidentally, if wsgi.uri_encoding is going to be the way to signal that 
the server has decoded bytes to characters using a known encoding, it 
should be stressed that this should only be set when that encoding is 
certain.

That is, wsgi.uri_encoding should be omitted (or None?) in cases where 
another party has already decoded (and maybe mangled) the bytes using an 
unknown encoding. In particular, CGI.

(In the case of Windows CGI the server will have decoded URI bytes into 
Unicode characters, using a charset which it is impossible to find out. 
In Apache it's iso-8859-1; in IIS it's UTF-8 as long as it was a valid 
UTF sequence, otherwise it's the system codepage. This problem affects 
the non-CGI implementation isapi_wsgi, too. Then the variables are read 
as environment variables, which for Python 2 means another encode/decode 
step on Windows using the system codepage, mangling non-codepage 
characters. Python 3 has the opposite problem reading byte envvars using 
UTF-8, which won't be how Apache put them there.)

If wsgi.encoding is obligatory then in reality it will often be wrong, 
leaving us in the same pathetic predicament as with WSGI 1.0, where 
non-ASCII URIs don't work reliably at all.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/



More information about the Web-SIG mailing list