[Web-SIG] WSGI, Python 3 and Unicode

Guido van Rossum guido at python.org
Fri Dec 7 06:06:26 CET 2007


On Dec 6, 2007 8:00 PM, Ian Bicking <ianb at colorstudy.com> wrote:
> Phillip J. Eby wrote:
> > At 08:08 PM 12/6/2007 -0500, Adam Atlas wrote:
> >
> >> On 6 Dec 2007, at 18:13, Graham Dumpleton wrote:
> >>> In Python 3 the default for string type objects will effectively be
> >>> Unicode. Is WSGI going to be made to somehow cope with that, or will
> >>> application instead be required to return byte string objects instead?
> >> I'd say it would be best to only accept `bytes` objects; anything else
> >> would require some guesswork. Maybe, at most, it could try to encode
> >> returned Unicode objects as ISO-8859-1, and have it be an error if
> >> that's not possible.
> >
> > Actually, I'd prefer to look at it the other way around: a Python 3
> > WSGI server or middleware *may* accept bytes objects instead of str.
> >
> > This is relatively easy for the response side of things, but the
> > request side is rather more difficult, since wsgi.input may need to
> > be binary rather than text mode.  (I think we can reasonably assume
> > that wsgi.errors is a text mode stream, and should support a
> > reasonable encoding.)
>
> wsgi.input definitely seems like it should be bytes to me.  Unless we
> want to put the encoding process into the server.  Not entirely
> infeasible, but a bit of a strain.  And the request body might very well
> be binary, e.g., on a PUT.
>
> The CGI keys in the environment don't feel at all like bytes to me, but
> then they aren't unicode either.  They can be unicode, again given a bit
> of work on the server side.  Though unfortunately browsers are very poor
> at indicating their encoding for requests, and it ends up being policy
> and configuration as much as anything that determines the encoding of
> stuff like wsgi.input.  I believe all request paths are UTF8 (?), but
> I'm not sure about QUERY_STRING.  I'm a little fuzzy on some of the
> details there.
>
> The actual response body should also be bytes.  Unless again we want to
> introduce upstream encoding.
>
> This does make everything feel more complicated.

It's the same level of complexity you run into as soon as you want to
handle Unicode with WSGI in 2.x though, as it is caused by something
outside our control (HTTP and browsers).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Web-SIG mailing list