[Web-SIG] Python 3.0 and WSGI 1.0.

P.J. Eby pje at telecommunity.com
Sat May 9 00:00:47 CEST 2009


At 10:37 AM 5/8/2009 -0700, Robert Brewer wrote:
>It also explicitly states that "HTTP does not directly support Unicode,
>and neither does this interface. All encoding/decoding must be handled
>by the application; all strings passed to or from the server must be
>standard Python BYTE STRINGS (emphasis mine), not Unicode objects. The
>result of using a Unicode object where a string object is required, is
>undefined."

It also says what the interpretation is when 'str' is a unicode string type.

>PEP 333 is difficult to interpret because it uses the name "str"
>synonymously with the concept "byte string", which Python 3000 defies. I
>believe the intent was to differentiate unicode from bytes, not elevate
>whatever type happens to be called "str" on your Python du jour. It was
>and is a mistake to standardize on type names ("str") across platforms
>and not on type behavior ("byte string").

Ironically, 'str' is what's consistent in type behavior; the bytes 
type doesn't supply the same operations.


>If Python3 WSGI apps emit unicode strings (py3k type 'str'), you're
>effectively saying the server will always call
>"chunk.encode('latin-1')". That negates any benefit of using unicode as
>the type for the response. That's not "supporting unicode"; that's using
>unicode exactly as if it were an opaque byte string. That's seems silly
>to me when there is a perfectly useful byte string type.

Compatibility sometimes demands we do silly things.  Personally, I 
think it's kind of silly that Python 3 files return incompatible data 
types depending on what mode you open them in, but there's not a 
whole lot we can do about that.

Meanwhile, existing WSGI code ported to Python 3 is going to yield 
strings until/unless manually converted; AFAIK 2to3 has no way to 
automatically detect WSGI-ness and convert your strings to bytes.


>I don't see any benefit to that.

There isn't any benefit to doing it by *hand*.  However, backward 
compatibility demands that servers *accept* such strings, as they may 
be generated by legacy apps.

That's why the Python 3 WSGI amendments say servers MUST accept this, 
even thought applications SHOULD supply bytes.

That is, for new code, we do want bytes.  What we don't want, ever, 
is unicode characters above #255 in any unicode strings sent as part 
of the response body.



More information about the Web-SIG mailing list