[Web-SIG] Python 3.0 and WSGI 1.0.

Tue May 5 12:04:02 CEST 2009

2009/5/5 Armin Ronacher <armin.ronacher at active-4.com>:
> Hi,
>
> Graham Dumpleton wrote:
>> I can't see but have choice but to pass such settings through as
>> strings, else more than likely would cause problems for applications.
>> Problem is it isn't clear what encoding stuff can be in Apache
>> configuration. At the moment latin-1 is assumed.
>
> Because those information does not have a specified encoding I can see
> nothing wrong with it passing that information as bytestrings.  I would
> have no problem passing *all* values as bytestrings.

At what point does that become an inconvenience though? I guess that
is my concern, because if one has to do too many manual conversions in
an application, people will start to complain it becomes unwieldy to
use. In other words, you make it easier or more logical for
frameworks, but do you end up putting more burden on applications for
stuff outside those core values.

So, for those core CGI values which the framework is going to modify
even before an application sees them, then fine. Is the framework also
going to set the rules as to what encoding is used for other values in
the WSGI environment and convert them per that encoding when an
application requests them, or is the application always going to have
to deal with them as bytes?

As I keep saying, you guys who write the frameworks and applications
are going to know better than I, I am just challenging the notions as
a way of making people think about it so the end result is what is the
most logical thing to do. ;-)

>> In Python 2.X some WSGI adapters only allow Python 2.X strings (ie.,
>> bytes) and reject unicode strings. Others will convert unicode
>> strings, but rather than use latin-1, apply the default Python
>> encoding. Thus, there is no consistency.
>
> I think most will assert-reject unicode types and in -O just ignore them
> and fail in some way.  I haven't seen any of those doing a
> unicode->string conversion by encoding which btw is disallowed by the
> PEP anyways.

A CGI/WSGI bridge, if no explicit checks are made to disallow stuff
other than strings, will usually attempt to write to sys.stdout
whatever you give it. Thus unicode strings can be written and
presumably default encoding is applied.

>>> sys.stdout.write(u"abcd\n")
abcd

One can even write buffers.

>>> sys.stdout.write(buffer("abcd\n"))
abcd

>> Ultimately I am just implementing the WSGI adapter, I'll follow
>> whatever is decided. I am not in a position, since I don't develop
>> stuff that runs on it, to know what is best. So, as long as it is
>> clear what should be passed through as bytes for environment, ie.,
>> there is an all inclusive list, and don't somehow have to guess, then
>> am fine either way. I'd just like to see some decision and for that
>> decision not to be some time next year as am holding up mod_wsgi 3.0
>> until things have been clarified. :-(
>
> I hope we can find a solution for that before the Python 3.1 release,
> otherwise there is another wsgiref release with the current behavior
> which is just wrong.

We can hope, but I'm not holding my breath.

It is going to be rather stupid though if what ends up being the
standard is dictated by how wsgiref works in 3.1 as is.

Graham