[Web-SIG] Python 3.0 and WSGI 1.0.

Robert Brewer fumanchu at aminus.org
Fri May 8 17:07:13 CEST 2009


Graham Dumpleton wrote:
> Robert, do you have any comments on the restricting of response
> content to bytes and not allow fallback to conversion per latin-1?
> 
> I heard that in CherryPy WSGI server you are only allowing bytes. What
> is your rational for that at the moment?


In Python 2.x, one could easily mix unicode strings and byte strings in
the same interface, because they mostly supported the same operations.
Not so in Python 3.x--byte strings are missing everything from
capitalize() to zfill() [1]. I feel that choosing one type or the other
is required in order to avoid mountains of if-statements in middleware
(and lots of 'pass' statements if bytes are found).

I decided that that single type should be byte strings because I want
WSGI middleware and applications to be able to choose what encoding
their output is. Passing unicode to the server would require some
out-of-band method of telling the server which encoding to use per
response, which seemed unacceptable.

The down side, already alluded to, is that middleware cannot then call
e.g. response.capitalize() or any of a number of other methods without
first decoding the response. And it cannot do that reliably unless
(again) the encoding which was used to produce bytes is communicated
down the stack out of band.

The python3 branch of CherryPy is by no means complete. I'd be happy to
explore emitting unicode if we could decide on a method whereby apps
could inform the server which encoding they want. Middleware which
transcoded the response would need a means of overriding that. But of
course, that opens a whole new can of worms if something goes wrong,
because application authors want control over the error response; if the
server is encoding the response, and an error occurs, there would have
to be a way to pass control back up the stack to...what? whichever
component last set the encoding? That road starts to get complicated
very quickly.

If some middleware needs to treat the response as unicode, I'd rather
emit bytes and somehow return the encoding as part of the response.
Perhaps WSGI 2's mythical "return (status, headers, body-iterable,
encoding)". Middleware could then decode/transcode as desired. I can't
think of a downside to that, other than some lost cycles spent
de/encoding, but perhaps there are some I don't yet foresee.


Robert Brewer
fumanchu at aminus.org

[1] See http://docs.python.org/dev/py3k/library/stdtypes.html#string-methods
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090508/fa5fc7cf/attachment.htm>


More information about the Web-SIG mailing list