[Web-SIG] WSGI, Python 3 and Unicode

Ian Bicking ianb at colorstudy.com
Fri Dec 7 21:24:56 CET 2007


Phillip J. Eby wrote:
> So here are my recommendations so far for the addendum to WSGI *1.0* for 
> Python 3.0 (I expect we can be more strict for WSGI 2.0):
> 
> * When running under Python 3, applications SHOULD produce bytes output 
> and headers
> 
> * When running under Python 3, servers and gateways MUST accept strings 
> as application output or headers, under the existing rules (i.e., 
> s.encode('latin-1') must convert the string to bytes without an exception)
> 
> * When running under Python 3, servers MUST provide CGI HTTP variables 
> as strings, decoded from the headers using HTTP standard encodings (i.e. 
> latin-1 + RFC 2047)  (Open question: are there any CGI or WSGI variables 
> that should NOT be strings?)

I believe that SCRIPT_NAME/PATH_INFO would be UTF8 encoded, not latin1. 
  That is, after you urldecode the values (as WSGI asks you to do) 
proper conversion to text is to decode it as UTF8.

I'm a bit confused on how HTTP_COOKIE gets encoded.  And QUERY_STRING 
also confuses me.

Is this all compatible with os.environ in py3k?  I don't care that much 
if it does, but as the starting point for CGI it would be interesting if 
it stays in sync.

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org


More information about the Web-SIG mailing list