[Web-SIG] WSGI 2

James Y Knight foom at fuhm.net
Wed Aug 5 03:48:30 CEST 2009


On Aug 4, 2009, at 8:53 PM, Graham Dumpleton wrote:

> 2. How would use of bytes work for a CGI-WSGI bridge given that
> os.environ is not bytes? Where does one get what encoding was used for
> os.environ values so it can be converted back to bytes?

On Unix it's simple enough:
On py2.X on Unix: environ is bytes already.
On py3.0: you're screwed, because some env vars were discarded already.
On py3.1+: 'string'.encode(sys.getfilesystemencoding(),  
'surrogateescape') should do it.

On Windows, I guess the OS environment is unicode, so, I don't know  
precisely what to do to reversibly obtain the bytes sent from the end- 
users's browser. It looks to me from source code as if Apache will  
encode the bytes from the client (utf-8 or otherwise!) as the Unicode  
values 0x00 to 0xFF in the windows environment, that is, as if  
decoding the client input in latin-1. But it does that for the  
following keys only:
HTTP_*
SERVER_*
REQUEST_*
QUERY_STRING
PATH_INFO
PATH_TRANSLATED
(from http://svn.apache.org/repos/asf/httpd/httpd/trunk/modules/arch/win32/mod_win32.c)

Other values are decoded from utf-8 (or, if passed through from an  
enclosing environment, passed through untouched -- via encoding into  
utf-8 for internal use and then decoding back from utf-8 to put back  
in the Windows environment.)

I'll note that while it's important to get this transformation correct  
for a CGI->WSGI bridge to work right in Windows, and thus is  
definitely a useful discussion to have here, it doesn't actually need  
to be part of the WSGI spec.

James


More information about the Web-SIG mailing list