[Python-Dev] PEP 3333: wsgi_string() function

Fri Jan 7 06:12:16 CET 2011

At 04:00 PM 1/6/2011 -0800, Raymond Hettinger wrote:
>Can you please take a look at
><http://docs.python.org/dev/whatsnew/3.2.html#pep-3333-python-web-server-gateway-interface-v1-0-1>http://docs.python.org/dev/whatsnew/3.2.html#pep-3333-python-web-server-gateway-interface-v1-0-1
>to see if it accurately recaps the resolution of the WSGI text/bytes issues.
>I would appreciate any feedback, as it is likely that the whatsnew
>document will be most people's first chance to hear the outcome
>of the multi-year discussion.

Hi Raymond -- nice work there.  A few minor suggestions:

1. Native strings are used as the keys and values of the environ 
dictionary, not just as headers for start_response.

2. The read_environ() method is strictly for use with CGI-to-WSGI 
gateways, or for bridging other CGI-like protocols (e.g. FastCGI) to 
WSGI.  It is ONLY for server implementers, in other words, and the 
typical app developer is doing something terribly wrong if they are 
even bothering to read its documentation.  ;-)

3. The primary relevance of the "native string" type to an app 
developer is that when porting code from Python 2 to 3, they must 
still decode environment variable values, even though they are 
"already" Unicode.  If their code was previously dealing only in 
Python 2 'str' objects, then nothing really changes.  If they were 
previously decoding from environ str's to unicode, then they must 
replace their prior .decode('whatever') with 
.encode('latin1').decode('whatever').  That's basically it for 
porting from Python 2.

IOW, this design choice allows most HTTP header manipulating code 
(whether input or output) to be ported to Python 3 with a very 
mechanical change pattern.  Most such code is working with ASCII 
anyway, since normally both input and output headers are, and there 
are few headers that an application would be likely to convert to 
actual unicode anyway.

On output via send_response(), if an application is currently 
encoding an output header  -- why they would be, I have no idea, but 
if they are -- they need to add a re-encode to latin1.  (i.e., 
.encode('whatever').decode('latin1'))

IOW, a short 2-to-3 porting guide for WSGI:

* If you just used strings for headers before, that part of your code 
doesn't change.  (And if it was broken before, it's still broken in 
exactly the same way.  No new breakage is introduced. ;-) )

* If you encoded any output headers or decoded any input headers, you 
must take into account the extra latin1 step.  This is expected to be 
rare, since it's usually only SCRIPT_NAME and PATH_INFO that anybody 
would ever care about on input, and almost never anything on output.

* Values yielded by an application or sent via a write() call MUST be 
byte strings; The environ and start_response() MUST be native 
strings.  No mixing and matching.