[Web-SIG] WSGI for Python 3

Graham Dumpleton graham.dumpleton at gmail.com
Fri Aug 27 06:17:09 CEST 2010


On 27 August 2010 13:45, P.J. Eby <pje at telecommunity.com> wrote:
> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
>>
>> Hi,
>>
>> Is there a status update on that now I missed?  Did something decide on
>> bytes for the environment values or are we still unsure about that?
>
> To the extent we're "unsure", I think the holdup is simply that nobody has
> tried doing an all-bytes WSGI implementation -- unless of course you count
> all our Python 2.x experience as experience with an all-bytes
> implementation.  ;-)
>
> (Of course, that experience won't help us with Python 3 stdlib issues.)
>
>
>> At that point I don't care at all about what is decided on as long as
>> something is decided.  Can someone please stand up and just do that? :)
>
> Essentially the problem right now is that unless such a choice is made,
> there's little hope of getting the stdlib issues to be resolved, because we
> can't exactly file bug reports against the stdlib if we don't know what we
> want it to do.  ;-)
>
> My personal inclination is to define WSGI 2 as a bytes-oriented protocol,
> and then encourage people to port to WSGI 2 before moving to Python 3.

Since the major stumbling block, irrespective of other changes, to any
sort of agreement is still bytes vs unicode, and where we have a
reasonable clear definition of what unicode suggestion is, can we
please as a first step get a definition of what bytes actually implies
so everyone knows what we are talking about. I specifically ask this,
as it isn't clear because people don't explain in detail what they
mean when they are saying 'bytes'.

Going back to my definition #2 in my blog post from a year ago, I had:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

The points of disagreement I have seen about this is are as follows.

For (1), the keys should also be bytes, including names of 'wsgi.' special keys.

For (2), the value of 'wsgi.url_scheme' should be bytes.

So, do you really want bytes absolutely everywhere, or are keys still
going to be unicode taken as ISO-8859-1.

Note that we are not agreeing to the final solution here, just what
bytes means in contrast to the unicode option, so we know that we are
comparing only two options and not many options because people have
different interpretations of what bytes means.

As contrast, what we generally mean by the unicode option is
definition #3 from my blog post. That being:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are native strings. Where native strings are unicode
strings, ISO-8859-1 encoding would be used such that the original
character data is preserved and as necessary the unicode string can be
converted back to bytes and thence decoded to unicode again using a
different encoding.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application should be a byte
string. Where native strings are unicode strings, the native string
type can also be returned in which case it would be encoded as
ISO-8859-1.

6. The list of response headers specified by the WSGI application
should contain tuples consisting of two values, where each value is a
byte string. Where native strings are unicode strings, the native
string type can also be returned in which case it would be encoded as
ISO-8859-1.

7. The iterable returned by the application and from which response
content is derived, should yield byte strings. Where native strings
are unicode strings, the native string type can also be returned in
which case it would be encoded as ISO-8859-1.

Even though call it unicode, it actually has bytes in places as well.
The key issues over bytes vs unicode has been in values in the
dictionary, but as pointed out about, not clear whether for bytes
option, we are talking about bytes for keys as well and for value of
'wsgi.url_scheme'.

So, can we can clarify this first. And if you are going to comment,
for that extra clarity, cut and paste my definition #2 above and make
the changes to it so we have the full definition, rather than just
referring to bits. That way people who come and read this don't have
to troll through the whole email chain to derive the context.

Once we get that clarification, then we can perhaps discuss
exclusively any issues people have with that bytes definition. That is
before we even try to balance it against the unicode option or look at
other WSGI 2 changes such as dropping start_response and
wsgi.file_wrapper.

And I apologise in advance if I start getting cranky and people think
I am trying to hijack the conversation. I want a solution more so than
probably anyone else as I can't fix up mod_wsgi until there is and
right now am I feeling pretty unmotivated towards doing anything with
mod_wsgi at all, even non Python 3.X enhancements because of all this.
So, if we can keep focus and try going one step at a time, maybe I
will not got ballistic. ;-)

Graham


More information about the Web-SIG mailing list