[Web-SIG] WSGI Amendments thoughts: the horror of charsets

Andrew Clover and-py at doxdesk.com
Wed Nov 19 01:40:57 CET 2008


> ctypes.windll.kernel32.GetEnvironmentVariableW(u'PATH_INFO', ...)

Hmm... it turns out: no. IIS appears to be mangling characters that are 
not in mbcs even *before* it puts the decoded value into the envvars.

The same is true with isapi_wsgi, which is the only other WSGI adapter I 
know of for IIS. This gets the same mangled byte string from 
GetServerVariable as Python gets from the envvars, so it looks like this 
is a mistake IIS is making further up before it even hits the CGI 
handler. Maybe someone more familiar with ISAPI knows a better way to 
read PATH_INFO than GetServerVariable, but I can't see anything 
promising in MSDN.

So it would seem to be impossible at the moment to have Unicode paths 
work under IIS at all.

The ctypes approach could rescue bytes for the Apache/nt/Py2 combination 
(perhaps also from libc.getenv for Apache/posix/Py3), but then Apache 
already gives us REQUEST_URI which is a much easier workaround. There 
might be CGI servers for Windows where ctypes could serve some purpose, 
but I can't think of any currently in use other than the Big Two.

In summary, to get the original submitted byte strings for PATH_INFO:

Apache/nt/Py2
     process REQUEST_URI
Apache/posix/Py2
     use PATH_INFO directly
     (or process REQUEST_URI)
Apache/nt/Py3
     encode PATH_INFO to ISO-8859-1
     (or process REQUEST_URI)
Apache/posix/Py3
     process REQUEST_URI
IIS/nt/Py2
     decode PATH_INFO from mbcs, then encode to UTF-8
     FAIL for characters not in current mbcs
     FAIL for non-UTF-8 input
IIS/nt/Py3
     encode PATH_INFO to UTF-8
     FAIL for characters not in current mbcs
     FAIL for non-UTF-8 input
wsgiref.simple_server/Py2
     use PATH_INFO directly
wsgiref.simple_server/Py3
     remains to be seen, but at the moment encode PATH_INFO to UTF-8
     FAIL for non-UTF-8 input
cherrypy.wsgiserver/Py2
     use PATH_INFO directly
cherrypy.wsgiserver/Py3
     remains to be seen, but at the moment encode PATH_INFO to UTF-8
     FAIL for non-UTF-8 input

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


More information about the Web-SIG mailing list