From davidgshi at yahoo.co.uk  Tue Aug  3 14:41:59 2010
From: davidgshi at yahoo.co.uk (David Shi)
Date: Tue, 3 Aug 2010 12:41:59 +0000 (GMT)
Subject: [Web-SIG] WAP communicating with server-side Python
Message-ID: <536331.71732.qm@web26304.mail.ukl.yahoo.com>

Is there an equivalent mailling list for WAP?
I am in need of a?very simple demo?website/webpage accessible by mobile handset 
and simply get a few critical data, e.g.
its id, or/and x, y, z?position.
http://www.google.co.uk/search?hl=en&q=how+to+mobile+website&aq=8&aqi=g10&aql=&oq=how+to+mobile&gs_rfai=


I want to try out moving my Python internet service on to mobile phones.? 
Therefore, the front-end will have to be in WAP.? 


All I need is an excellent demo WAP page to show me how to get the mobile 
handset's ID, and x, y, z position.

Regards.

David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20100803/786e3a8d/attachment.html>

From armin.ronacher at active-4.com  Fri Aug 27 01:37:39 2010
From: armin.ronacher at active-4.com (Armin Ronacher)
Date: Fri, 27 Aug 2010 01:37:39 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
Message-ID: <4C76FAC3.5010801@active-4.com>

Hi,

Is there a status update on that now I missed?  Did something decide on 
bytes for the environment values or are we still unsure about that?

 From a discussion lately I had with Graham on #pocoo it seems like he 
lost interest on supporting WSGI on Python 3 for the time being due to 
lack of interest.

My personal pet project of actively redesigning WSGI to see if a 
higher-level protocol would solve the unicode issue better failed and 
was not worth the effort.

As I understand Python 3.0/1/2 will be broken for WSGI anyways so we can 
stop caring about the stdlib.

CherryPy seems to be the only system currently with an actively 
maintained Python 3 version of WSGI which from my understanding is based 
on unicode and bytes, where unicode is seen as latin1.

At that point I don't care at all about what is decided on as long as 
something is decided.  Can someone please stand up and just do that? :)


Regards,
Armin

From pje at telecommunity.com  Fri Aug 27 05:45:51 2010
From: pje at telecommunity.com (P.J. Eby)
Date: Thu, 26 Aug 2010 23:45:51 -0400
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <4C76FAC3.5010801@active-4.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
Message-ID: <20100827034601.51C3C3A40A4@sparrow.telecommunity.com>

At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
>Hi,
>
>Is there a status update on that now I missed?  Did something decide 
>on bytes for the environment values or are we still unsure about that?

To the extent we're "unsure", I think the holdup is simply that 
nobody has tried doing an all-bytes WSGI implementation -- unless of 
course you count all our Python 2.x experience as experience with an 
all-bytes implementation.  ;-)

(Of course, that experience won't help us with Python 3 stdlib issues.)


>At that point I don't care at all about what is decided on as long 
>as something is decided.  Can someone please stand up and just do that? :)

Essentially the problem right now is that unless such a choice is 
made, there's little hope of getting the stdlib issues to be 
resolved, because we can't exactly file bug reports against the 
stdlib if we don't know what we want it to do.  ;-)

My personal inclination is to define WSGI 2 as a bytes-oriented 
protocol, and then encourage people to port to WSGI 2 before moving 
to Python 3.

In theory, if we did it correctly it could actually minimize the 
porting pain for Python 3.

In practice, I'm not sure how to do this, as I lack experience with 
2to3 at the moment, or any production experience with Python 3 whatsoever.


From graham.dumpleton at gmail.com  Fri Aug 27 06:17:09 2010
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 27 Aug 2010 14:17:09 +1000
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
Message-ID: <AANLkTim4ueXz103NsVuw88ikwbP=B5U_+c1m8OQVO1fc@mail.gmail.com>

On 27 August 2010 13:45, P.J. Eby <pje at telecommunity.com> wrote:
> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
>>
>> Hi,
>>
>> Is there a status update on that now I missed? ?Did something decide on
>> bytes for the environment values or are we still unsure about that?
>
> To the extent we're "unsure", I think the holdup is simply that nobody has
> tried doing an all-bytes WSGI implementation -- unless of course you count
> all our Python 2.x experience as experience with an all-bytes
> implementation. ?;-)
>
> (Of course, that experience won't help us with Python 3 stdlib issues.)
>
>
>> At that point I don't care at all about what is decided on as long as
>> something is decided. ?Can someone please stand up and just do that? :)
>
> Essentially the problem right now is that unless such a choice is made,
> there's little hope of getting the stdlib issues to be resolved, because we
> can't exactly file bug reports against the stdlib if we don't know what we
> want it to do. ?;-)
>
> My personal inclination is to define WSGI 2 as a bytes-oriented protocol,
> and then encourage people to port to WSGI 2 before moving to Python 3.

Since the major stumbling block, irrespective of other changes, to any
sort of agreement is still bytes vs unicode, and where we have a
reasonable clear definition of what unicode suggestion is, can we
please as a first step get a definition of what bytes actually implies
so everyone knows what we are talking about. I specifically ask this,
as it isn't clear because people don't explain in detail what they
mean when they are saying 'bytes'.

Going back to my definition #2 in my blog post from a year ago, I had:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

The points of disagreement I have seen about this is are as follows.

For (1), the keys should also be bytes, including names of 'wsgi.' special keys.

For (2), the value of 'wsgi.url_scheme' should be bytes.

So, do you really want bytes absolutely everywhere, or are keys still
going to be unicode taken as ISO-8859-1.

Note that we are not agreeing to the final solution here, just what
bytes means in contrast to the unicode option, so we know that we are
comparing only two options and not many options because people have
different interpretations of what bytes means.

As contrast, what we generally mean by the unicode option is
definition #3 from my blog post. That being:

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are native strings. Where native strings are unicode
strings, ISO-8859-1 encoding would be used such that the original
character data is preserved and as necessary the unicode string can be
converted back to bytes and thence decoded to unicode again using a
different encoding.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application should be a byte
string. Where native strings are unicode strings, the native string
type can also be returned in which case it would be encoded as
ISO-8859-1.

6. The list of response headers specified by the WSGI application
should contain tuples consisting of two values, where each value is a
byte string. Where native strings are unicode strings, the native
string type can also be returned in which case it would be encoded as
ISO-8859-1.

7. The iterable returned by the application and from which response
content is derived, should yield byte strings. Where native strings
are unicode strings, the native string type can also be returned in
which case it would be encoded as ISO-8859-1.

Even though call it unicode, it actually has bytes in places as well.
The key issues over bytes vs unicode has been in values in the
dictionary, but as pointed out about, not clear whether for bytes
option, we are talking about bytes for keys as well and for value of
'wsgi.url_scheme'.

So, can we can clarify this first. And if you are going to comment,
for that extra clarity, cut and paste my definition #2 above and make
the changes to it so we have the full definition, rather than just
referring to bits. That way people who come and read this don't have
to troll through the whole email chain to derive the context.

Once we get that clarification, then we can perhaps discuss
exclusively any issues people have with that bytes definition. That is
before we even try to balance it against the unicode option or look at
other WSGI 2 changes such as dropping start_response and
wsgi.file_wrapper.

And I apologise in advance if I start getting cranky and people think
I am trying to hijack the conversation. I want a solution more so than
probably anyone else as I can't fix up mod_wsgi until there is and
right now am I feeling pretty unmotivated towards doing anything with
mod_wsgi at all, even non Python 3.X enhancements because of all this.
So, if we can keep focus and try going one step at a time, maybe I
will not got ballistic. ;-)

Graham

From armin.ronacher at active-4.com  Fri Aug 27 16:22:47 2010
From: armin.ronacher at active-4.com (Armin Ronacher)
Date: Fri, 27 Aug 2010 16:22:47 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
Message-ID: <4C77CA37.1050603@active-4.com>

Hi,

On 2010-08-27 5:45 AM, P.J. Eby wrote:
 > At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
 > To the extent we're "unsure", I think the holdup is simply that nobody
 > has tried doing an all-bytes WSGI implementation -- unless of course you
 > count all our Python 2.x experience as experience with an all-bytes
 > implementation. ;-)
I have a private branch of Werkzeug that is all bytes only.  Untested 
unfortunately because porting the testsuite over is a huge task on its 
own and not all parts work properly yet.  But it's okayish.

Werkzeug does not use anything from the standard library in the latest 
version except urljoin from the url parse package which I would have to 
rewrite for my little experiment.  In my attempt to port it I'm doing 
the encode/decode dance in a wrapper function.

 > In theory, if we did it correctly it could actually minimize the porting
 > pain for Python 3.
 >
 > In practice, I'm not sure how to do this, as I lack experience with 2to3
 > at the moment, or any production experience with Python 3 whatsoever.
The big problem for me is that we *will* have to run to 2to3 because 
WSGI sometimes leaks from the framework to the application.  This is 
especially true for Django where request.META is directly passed as WSGI 
environment to the user and no accessor functions exist.  So everybody 
and is parsing the headers themselves there.

So when frameworks are starting to support any version of WSGI on Python 
3 they will also have to ship custom 2to3 fixers that add tiny shims for 
decoding/encoding either side of comparisons etc.

For example it's pretty common to see stuff like this:

     if 'msie' in request.META.get('HTTP_USER_AGENT', '').lower():

For an all bytes approach a tool would have to recognize that this is 
from a WSGI environment and change the code to this:

     if b'msie' in request.META.get('HTTP_USER_AGENT', b'').lower():

That's not impossible to do and in my mind the right decision, but it 
also means extra work to be done.  And if extra work is required when 
porting a framework and application over to Python 3 we could reward the 
people doing that with improvements of the specification itself.

I'm thinking about improving file_wrapper (so that middlewares can 
either detect that a file_wrapper is here and they should not consume 
the app iter, or just replacing it with a custom header), the input 
stream etc.


Regards,
Armin

From cito at online.de  Fri Aug 27 18:05:15 2010
From: cito at online.de (Christoph Zwerschke)
Date: Fri, 27 Aug 2010 18:05:15 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <4C77CA37.1050603@active-4.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>	<4C76FAC3.5010801@active-4.com>	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
	<4C77CA37.1050603@active-4.com>
Message-ID: <4C77E23B.7090108@online.de>

Am 27.08.2010 16:22 schrieb Armin Ronacher:
 > For an all bytes approach a tool would have to recognize that this is
 > from a WSGI environment and change the code to this:
 >
 > if b'msie' in request.META.get('HTTP_USER_AGENT', b'').lower():

Btw, another problem with this is that the lower() method does not know 
that it has to use latin1 when lowercasing. For instance,

user = '?zkan'.encode('latin1')
if user in request.META.get('REMOTE_USER', b'').lower():

will not work it the user has logged in as '?zkan'.

-- Christoph

From pje at telecommunity.com  Fri Aug 27 18:27:08 2010
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 27 Aug 2010 12:27:08 -0400
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <4C77E23B.7090108@online.de>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
	<4C77CA37.1050603@active-4.com> <4C77E23B.7090108@online.de>
Message-ID: <20100827162719.746463A409E@sparrow.telecommunity.com>

At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote:
>  For instance,
>
>user = '?zkan'.encode('latin1')
>if user in request.META.get('REMOTE_USER', b'').lower():
>
>will not work it the user has logged in as '?zkan'.

Isn't that a problem with code that does this now? 


From pje at telecommunity.com  Fri Aug 27 19:01:56 2010
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 27 Aug 2010 13:01:56 -0400
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTim4ueXz103NsVuw88ikwbP=B5U_+c1m8OQVO1fc@mail.gmail.c
 om>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
	<AANLkTim4ueXz103NsVuw88ikwbP=B5U_+c1m8OQVO1fc@mail.gmail.com>
Message-ID: <20100827170206.5D0293A409E@sparrow.telecommunity.com>

At 02:17 PM 8/27/2010 +1000, Graham Dumpleton wrote:
>Since the major stumbling block, irrespective of other changes, to any
>sort of agreement is still bytes vs unicode, and where we have a
>reasonable clear definition of what unicode suggestion is, can we
>please as a first step get a definition of what bytes actually implies
>so everyone knows what we are talking about. I specifically ask this,
>as it isn't clear because people don't explain in detail what they
>mean when they are saying 'bytes'.
>
>Going back to my definition #2 in my blog post from a year ago, I had:
>
>1. The application is passed an instance of a Python dictionary
>containing what is referred to as the WSGI environment. All keys in
>this dictionary are native strings. For CGI variables, all names are
>going to be ISO-8859-1 and so where native strings are unicode
>strings, that encoding is used for the names of CGI variables

FYI, one thing that's changed here is the existence of os.environb in 
Python 3.2, at least on non-Windows OSes.


>2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>environment, the value of the variable should be a native string.

Since any meaningful use of this value is going to end up needing to 
be bytes again (e.g. Location headers), and for consistency's sake, I 
lean towards saying this is bytes too.


>3. For the CGI variables contained in the WSGI environment, the values
>of the variables are byte strings.
>
>4. The WSGI input stream 'wsgi.input' contained in the WSGI
>environment and from which request content is read, should yield byte
>strings.
>
>5. The status line specified by the WSGI application must be a byte string.
>
>6. The list of response headers specified by the WSGI application must
>contain tuples consisting of two values, where each value is a byte
>string.
>
>7. The iterable returned by the application and from which response
>content is derived, must yield byte strings.
>
>The points of disagreement I have seen about this is are as follows.
>
>For (1), the keys should also be bytes, including names of 'wsgi.' 
>special keys.
>
>For (2), the value of 'wsgi.url_scheme' should be bytes.
>
>So, do you really want bytes absolutely everywhere, or are keys still
>going to be unicode taken as ISO-8859-1.

If we follow the example of os.environb, then the keys have to be bytes also.

However, I can already see that the big problem with all of this is 
that WSGI code is going to be littered with a plague of "b"s hanging 
off the front of every string literal, and that 2to3 is probably not 
going to handle it correctly.  Making the keys bytes as well just 
multiplies the problem.


>Note that we are not agreeing to the final solution here, just what
>bytes means in contrast to the unicode option, so we know that we are
>comparing only two options and not many options because people have
>different interpretations of what bytes means.
>
>As contrast, what we generally mean by the unicode option is
>definition #3 from my blog post. That being:
>
>1. The application is passed an instance of a Python dictionary
>containing what is referred to as the WSGI environment. All keys in
>this dictionary are native strings. For CGI variables, all names are
>going to be ISO-8859-1 and so where native strings are unicode
>strings, that encoding is used for the names of CGI variables
>
>2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
>environment, the value of the variable should be a native string.
>
>3. For the CGI variables contained in the WSGI environment, the values
>of the variables are native strings. Where native strings are unicode
>strings, ISO-8859-1 encoding would be used such that the original
>character data is preserved and as necessary the unicode string can be
>converted back to bytes and thence decoded to unicode again using a
>different encoding.
>
>4. The WSGI input stream 'wsgi.input' contained in the WSGI
>environment and from which request content is read, should yield byte
>strings.
>
>5. The status line specified by the WSGI application should be a byte
>string. Where native strings are unicode strings, the native string
>type can also be returned in which case it would be encoded as
>ISO-8859-1.
>
>6. The list of response headers specified by the WSGI application
>should contain tuples consisting of two values, where each value is a
>byte string. Where native strings are unicode strings, the native
>string type can also be returned in which case it would be encoded as
>ISO-8859-1.
>
>7. The iterable returned by the application and from which response
>content is derived, should yield byte strings. Where native strings
>are unicode strings, the native string type can also be returned in
>which case it would be encoded as ISO-8859-1.
>
>Even though call it unicode, it actually has bytes in places as well.
>The key issues over bytes vs unicode has been in values in the
>dictionary, but as pointed out about, not clear whether for bytes
>option, we are talking about bytes for keys as well and for value of
>'wsgi.url_scheme'.

The main issue I have with this option is that it seems to make it 
trivially easy to write an app or piece of middleware that seems to 
work correctly most of the time, unless placed in the right 
combination with other apps or middleware.

More precisely, an updated wsgiref.validate module used to check the 
"unicode option" would mark such apps and middleware as perfectly 
spec-conformant, yet this spec-conformance would not be transitive - 
i.e., you couldn't say that an assembly of spec-conformant middleware 
and apps would be correct.

Hmmm...  unless...  I guess the only way to be really sure would be 
if the validation process randomly changed the types of input and 
output values to both ways allowed by the spec, and verified that the 
results were still compliant.  ;-)

(In practice, I expect that getting it to do that would be rather 
difficult, though.)

Let me see if I can more precisely narrow down my concern.

Mostly, it boils down to the possibility of non-latin1 unicode 
"escaping" into the output stream...  so if #5, #6 and #7 above were 
changed to bytes-only outputs, then an updated validator can enforce 
those criteria, making spec-compliance verification 
composable.  (That is, if you combine two things that are verified 
compliant, the combination is also known to be compliant.)

So, I could actually support a format that was "unicode (latin1) 
headers in, bytes headers out", and "bytes stream in, bytes stream out".

You can then concentrate all your encoding or decoding operations at 
one place, or even write a decorator to take care of it for you.


>So, can we can clarify this first. And if you are going to comment,
>for that extra clarity, cut and paste my definition #2 above and make
>the changes to it so we have the full definition, rather than just
>referring to bits. That way people who come and read this don't have
>to troll through the whole email chain to derive the context.
>
>Once we get that clarification, then we can perhaps discuss
>exclusively any issues people have with that bytes definition. That is
>before we even try to balance it against the unicode option or look at
>other WSGI 2 changes such as dropping start_response and
>wsgi.file_wrapper.
>
>And I apologise in advance if I start getting cranky and people think
>I am trying to hijack the conversation. I want a solution more so than
>probably anyone else as I can't fix up mod_wsgi until there is and
>right now am I feeling pretty unmotivated towards doing anything with
>mod_wsgi at all, even non Python 3.X enhancements because of all this.
>So, if we can keep focus and try going one step at a time, maybe I
>will not got ballistic. ;-)

Thanks for hanging in there, and also for posting this summary!


From cito at online.de  Fri Aug 27 19:08:43 2010
From: cito at online.de (Christoph Zwerschke)
Date: Fri, 27 Aug 2010 19:08:43 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <20100827162719.746463A409E@sparrow.telecommunity.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
	<4C77CA37.1050603@active-4.com> <4C77E23B.7090108@online.de>
	<20100827162719.746463A409E@sparrow.telecommunity.com>
Message-ID: <4C77F11B.2020907@online.de>

Am 27.08.2010 18:27 schrieb P.J. Eby:
 > At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote:
 >> user = '?zkan'.encode('latin1')
 >> if user = request.META.get('REMOTE_USER', b'').lower():
 >>
 >> will not work it the user has logged in as '?zkan'.
 >
 > Isn't that a problem with code that does this now?

You mean in Python 2? If the locale is set properly, lower() will 
account for non-ascii. I don't think Python 3 does this with bytes.

-- Christoph

From paul.joseph.davis at gmail.com  Fri Aug 27 21:26:53 2010
From: paul.joseph.davis at gmail.com (Paul Davis)
Date: Fri, 27 Aug 2010 15:26:53 -0400
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTim4ueXz103NsVuw88ikwbP=B5U_+c1m8OQVO1fc@mail.gmail.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
	<AANLkTim4ueXz103NsVuw88ikwbP=B5U_+c1m8OQVO1fc@mail.gmail.com>
Message-ID: <AANLkTinuZG+55DFT=ABcZ4f3orGPEbqotYSm08Jfn+gN@mail.gmail.com>

On Fri, Aug 27, 2010 at 12:17 AM, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
> On 27 August 2010 13:45, P.J. Eby <pje at telecommunity.com> wrote:
>> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote:
>>>
>>> Hi,
>>>
>>> Is there a status update on that now I missed? ?Did something decide on
>>> bytes for the environment values or are we still unsure about that?
>>
>> To the extent we're "unsure", I think the holdup is simply that nobody has
>> tried doing an all-bytes WSGI implementation -- unless of course you count
>> all our Python 2.x experience as experience with an all-bytes
>> implementation. ?;-)
>>
>> (Of course, that experience won't help us with Python 3 stdlib issues.)
>>
>>
>>> At that point I don't care at all about what is decided on as long as
>>> something is decided. ?Can someone please stand up and just do that? :)
>>
>> Essentially the problem right now is that unless such a choice is made,
>> there's little hope of getting the stdlib issues to be resolved, because we
>> can't exactly file bug reports against the stdlib if we don't know what we
>> want it to do. ?;-)
>>
>> My personal inclination is to define WSGI 2 as a bytes-oriented protocol,
>> and then encourage people to port to WSGI 2 before moving to Python 3.
>
> Since the major stumbling block, irrespective of other changes, to any
> sort of agreement is still bytes vs unicode, and where we have a
> reasonable clear definition of what unicode suggestion is, can we
> please as a first step get a definition of what bytes actually implies
> so everyone knows what we are talking about. I specifically ask this,
> as it isn't clear because people don't explain in detail what they
> mean when they are saying 'bytes'.
>
> Going back to my definition #2 in my blog post from a year ago, I had:
>
> 1. The application is passed an instance of a Python dictionary
> containing what is referred to as the WSGI environment. All keys in
> this dictionary are native strings. For CGI variables, all names are
> going to be ISO-8859-1 and so where native strings are unicode
> strings, that encoding is used for the names of CGI variables
>
> 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
> environment, the value of the variable should be a native string.
>
> 3. For the CGI variables contained in the WSGI environment, the values
> of the variables are byte strings.
>
> 4. The WSGI input stream 'wsgi.input' contained in the WSGI
> environment and from which request content is read, should yield byte
> strings.
>
> 5. The status line specified by the WSGI application must be a byte string.
>
> 6. The list of response headers specified by the WSGI application must
> contain tuples consisting of two values, where each value is a byte
> string.
>
> 7. The iterable returned by the application and from which response
> content is derived, must yield byte strings.
>
> The points of disagreement I have seen about this is are as follows.
>
> For (1), the keys should also be bytes, including names of 'wsgi.' special keys.
>
> For (2), the value of 'wsgi.url_scheme' should be bytes.
>
> So, do you really want bytes absolutely everywhere, or are keys still
> going to be unicode taken as ISO-8859-1.
>
> Note that we are not agreeing to the final solution here, just what
> bytes means in contrast to the unicode option, so we know that we are
> comparing only two options and not many options because people have
> different interpretations of what bytes means.
>
> As contrast, what we generally mean by the unicode option is
> definition #3 from my blog post. That being:
>
> 1. The application is passed an instance of a Python dictionary
> containing what is referred to as the WSGI environment. All keys in
> this dictionary are native strings. For CGI variables, all names are
> going to be ISO-8859-1 and so where native strings are unicode
> strings, that encoding is used for the names of CGI variables
>
> 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
> environment, the value of the variable should be a native string.
>
> 3. For the CGI variables contained in the WSGI environment, the values
> of the variables are native strings. Where native strings are unicode
> strings, ISO-8859-1 encoding would be used such that the original
> character data is preserved and as necessary the unicode string can be
> converted back to bytes and thence decoded to unicode again using a
> different encoding.
>
> 4. The WSGI input stream 'wsgi.input' contained in the WSGI
> environment and from which request content is read, should yield byte
> strings.
>
> 5. The status line specified by the WSGI application should be a byte
> string. Where native strings are unicode strings, the native string
> type can also be returned in which case it would be encoded as
> ISO-8859-1.
>
> 6. The list of response headers specified by the WSGI application
> should contain tuples consisting of two values, where each value is a
> byte string. Where native strings are unicode strings, the native
> string type can also be returned in which case it would be encoded as
> ISO-8859-1.
>
> 7. The iterable returned by the application and from which response
> content is derived, should yield byte strings. Where native strings
> are unicode strings, the native string type can also be returned in
> which case it would be encoded as ISO-8859-1.
>
> Even though call it unicode, it actually has bytes in places as well.
> The key issues over bytes vs unicode has been in values in the
> dictionary, but as pointed out about, not clear whether for bytes
> option, we are talking about bytes for keys as well and for value of
> 'wsgi.url_scheme'.
>
> So, can we can clarify this first. And if you are going to comment,
> for that extra clarity, cut and paste my definition #2 above and make
> the changes to it so we have the full definition, rather than just
> referring to bits. That way people who come and read this don't have
> to troll through the whole email chain to derive the context.
>
> Once we get that clarification, then we can perhaps discuss
> exclusively any issues people have with that bytes definition. That is
> before we even try to balance it against the unicode option or look at
> other WSGI 2 changes such as dropping start_response and
> wsgi.file_wrapper.
>
> And I apologise in advance if I start getting cranky and people think
> I am trying to hijack the conversation. I want a solution more so than
> probably anyone else as I can't fix up mod_wsgi until there is and
> right now am I feeling pretty unmotivated towards doing anything with
> mod_wsgi at all, even non Python 3.X enhancements because of all this.
> So, if we can keep focus and try going one step at a time, maybe I
> will not got ballistic. ;-)
>
> Graham
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com
>

I ran into this while I was attempting to put together enough code to
play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
deftly pointed out, its a pretty big pain in the rear.

Specifically, if we specify that all keys in the environ dictionary
are byte strings, then there's a noticeable amount of pain in trying
to write code that runs on both platforms. I object to 2to3.py on
religious grounds, so when I was implementing this I was doing so with
code that would run unmodified on both 2 and 3.

What I ran into is that if you want to support older than 2.6, all
environ key lookups must be wrapped with a helper function. This makes
code that uses the dict full of things like
environ[b("wsgi.errors")].write(b("some message")) where b is a helper
I wrote to convert to the right type for a given interpreter. And I'm
still not sure how Jython works with strings. PEP 333 says its unicode
only which makes me wonder how they would react to the bytes
everywhere approach.

I'm also not a big fan of automatically applying a default encoding to
*any* of the bytes read in an HTTP request. After contemplating for
awhile I came to the conclusion that header names are really part of
the request itself, where as the other keys in the environ are
metadata about the request. Having the two different types of data in
the same space domain seemed to be the root of the problem. So I
rearranged things so that there's an "http.headers" key that is a
dictionary with byte strings for keys and values.

I haven't managed to find any time to write a test suite for the spec
I was toying with but I figure its far enough along that it might be
interesting to someone. This code should be runnable on 2.5, 2.6 and
3.2. When I get back to working on it, my next goal was to figure out
a way to write the test suite in a way that it could run on any
implementation to test for compliance.

Code is at: http://github.com/davisp/wsgiref2

Paul Davis

From fumanchu at aminus.org  Fri Aug 27 22:04:06 2010
From: fumanchu at aminus.org (Robert Brewer)
Date: Fri, 27 Aug 2010 13:04:06 -0700
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTinuZG+55DFT=ABcZ4f3orGPEbqotYSm08Jfn+gN@mail.gmail.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com><4C76FAC3.5010801@active-4.com><20100827034601.51C3C3A40A4@sparrow.telecommunity.com><AANLkTim4ueXz103NsVuw88ikwbP=B5U_+c1m8OQVO1fc@mail.gmail.com>
	<AANLkTinuZG+55DFT=ABcZ4f3orGPEbqotYSm08Jfn+gN@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E640E535275@ex10.hostedexchange.local>

Paul Davis wrote:
> > Since the major stumbling block, irrespective of other changes,
> > to any sort of agreement is still bytes vs unicode
>
> I ran into this while I was attempting to put together enough code to
> play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
> deftly pointed out, its a pretty big pain in the rear.
> 
> Specifically, if we specify that all keys in the environ dictionary
> are byte strings, then there's a noticeable amount of pain in trying
> to write code that runs on both platforms. I object to 2to3.py on
> religious grounds, so when I was implementing this I was doing so with
> code that would run unmodified on both 2 and 3.

Religion is what gets us into this mess. Pragmatism will get us out. We
have two options:

 1. Continue to try to write code that runs unmodified on Python 2 and
3, or that runs when 2to3 is applied. There is a morass of corner cases
and state machines that behave differently depending on when you look at
them lurking here. You can all see where that is getting us: nowhere. By
the time you all discover how to write a spec that deals with all the
pain points which 2to3 introduces, Python 2 will be dead and you will
have wasted your time.
 2. Write a Python 3 version of your code. Yes, it's more drudge work.
Suck it up. To ameliorate that, make the Python 3 version the default as
soon as possible. Deprecate the Python 2 branch. Backport features as
necessary to the Python 2 branch (just as Python itself has been doing,
if you notice). If you do that, we can write a WSGI for Python 3 now
that doesn't suffer from any of the complexities of 2to3.


Robert Brewer
fumanchu at aminus.org

From paul.joseph.davis at gmail.com  Fri Aug 27 23:39:34 2010
From: paul.joseph.davis at gmail.com (Paul Davis)
Date: Fri, 27 Aug 2010 17:39:34 -0400
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E640E535275@ex10.hostedexchange.local>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>
	<AANLkTim4ueXz103NsVuw88ikwbP=B5U_+c1m8OQVO1fc@mail.gmail.com>
	<AANLkTinuZG+55DFT=ABcZ4f3orGPEbqotYSm08Jfn+gN@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E640E535275@ex10.hostedexchange.local>
Message-ID: <AANLkTimnREWEMh1fgeB=DErGw0qccpS-pneZTdQRPAQs@mail.gmail.com>

On Fri, Aug 27, 2010 at 4:04 PM, Robert Brewer <fumanchu at aminus.org> wrote:
> Paul Davis wrote:
>> > Since the major stumbling block, irrespective of other changes,
>> > to any sort of agreement is still bytes vs unicode
>>
>> I ran into this while I was attempting to put together enough code to
>> play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has
>> deftly pointed out, its a pretty big pain in the rear.
>>
>> Specifically, if we specify that all keys in the environ dictionary
>> are byte strings, then there's a noticeable amount of pain in trying
>> to write code that runs on both platforms. I object to 2to3.py on
>> religious grounds, so when I was implementing this I was doing so with
>> code that would run unmodified on both 2 and 3.
>
> Religion is what gets us into this mess. Pragmatism will get us out. We
> have two options:
>
> ?1. Continue to try to write code that runs unmodified on Python 2 and
> 3, or that runs when 2to3 is applied. There is a morass of corner cases
> and state machines that behave differently depending on when you look at
> them lurking here. You can all see where that is getting us: nowhere. By
> the time you all discover how to write a spec that deals with all the
> pain points which 2to3 introduces, Python 2 will be dead and you will
> have wasted your time.
> ?2. Write a Python 3 version of your code. Yes, it's more drudge work.
> Suck it up. To ameliorate that, make the Python 3 version the default as
> soon as possible. Deprecate the Python 2 branch. Backport features as
> necessary to the Python 2 branch (just as Python itself has been doing,
> if you notice). If you do that, we can write a WSGI for Python 3 now
> that doesn't suffer from any of the complexities of 2to3.
>
>
> Robert Brewer
> fumanchu at aminus.org
>

No. What got us into this mess was the idea that it would be a good to
silently type cast unicode objects into bytes. Perhaps I could've been
more clear on avoiding 2to3 though. I wanted to avoid coding any of
its oddities into a reference implementation because as you point out
it's just a source of confusion.

I'd like to point out that the code I posted works on both 2.x and
3.x. Its fairly easy to implement the backwards compatible code in
Python. There's nothing near the level of requiring a
branched/back-port strategy. Not to mention, a branched reference
implementation is bit of a contradiction in terms. The hard part is
figuring out a specification that doesn't suck when people try and
implement it on multiple interpreters.

Also, I think you're overestimating the rate at which people are going
to be converting to Python 3. I still have people ask for Python 2.4
support. I wouldn't be the least bit surprised if there's a WSGI 3
before we deprecate 2.x support.

HTH,
Paul Davis

From armin.ronacher at active-4.com  Sat Aug 28 01:24:48 2010
From: armin.ronacher at active-4.com (Armin Ronacher)
Date: Sat, 28 Aug 2010 01:24:48 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <4C77E23B.7090108@online.de>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>	<4C76FAC3.5010801@active-4.com>	<20100827034601.51C3C3A40A4@sparrow.telecommunity.com>	<4C77CA37.1050603@active-4.com>
	<4C77E23B.7090108@online.de>
Message-ID: <4C784940.6090805@active-4.com>

Hi,

On 2010-08-27 6:05 PM, Christoph Zwerschke wrote:
 > Btw, another problem with this is that the lower() method does not know
 > that it has to use latin1 when lowercasing.
That is not a problem insofar that case insensitive HTTP tokens are 
limited to ASCII only.


Regards,
Armin

From g.brandl at gmx.net  Sat Aug 28 13:04:27 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 28 Aug 2010 13:04:27 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <4C76FAC3.5010801@active-4.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com>
Message-ID: <i5aqlm$5o4$1@dough.gmane.org>

Am 27.08.2010 01:37, schrieb Armin Ronacher:
> Hi,
> 
> Is there a status update on that now I missed?  Did something decide on 
> bytes for the environment values or are we still unsure about that?
> 
>  From a discussion lately I had with Graham on #pocoo it seems like he 
> lost interest on supporting WSGI on Python 3 for the time being due to 
> lack of interest.
> 
> My personal pet project of actively redesigning WSGI to see if a 
> higher-level protocol would solve the unicode issue better failed and 
> was not worth the effort.
> 
> As I understand Python 3.0/1/2 will be broken for WSGI anyways so we can 
> stop caring about the stdlib.

Let me just throw in here that it's *NOT* too late to do something about
Python 3.2.  It is not even in beta state yet, and I am very willing to
introduce the changes to make web programming work again, or even hold
up 3.2 for a bit if you need more time.

However, someone who actually *does* web programming has to do that, in
other words, one of you.  All I see is complaints that it will not work
and one has to forget the stdlib.  That is somewhat sad.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From armin.ronacher at active-4.com  Sat Aug 28 13:13:19 2010
From: armin.ronacher at active-4.com (Armin Ronacher)
Date: Sat, 28 Aug 2010 13:13:19 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <i5aqlm$5o4$1@dough.gmane.org>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>	<4C76FAC3.5010801@active-4.com>
	<i5aqlm$5o4$1@dough.gmane.org>
Message-ID: <4C78EF4F.3070802@active-4.com>

Hi,

On 2010-08-28 1:04 PM, Georg Brandl wrote:
> Let me just throw in here that it's *NOT* too late to do something about
> Python 3.2.  It is not even in beta state yet, and I am very willing to
> introduce the changes to make web programming work again, or even hold
> up 3.2 for a bit if you need more time.
Sorry if I was not clear.  I was talking about only wsgiref here.  And 
for that to be adapted to a possible new WSGI specification we would 
need more time than you can hold the 3.2 release I think.

> However, someone who actually *does* web programming has to do that, in
> other words, one of you.  All I see is complaints that it will not work
> and one has to forget the stdlib.  That is somewhat sad.
While I am not happy with the decisions of the stdlib for unicode in 
some parts, my mail was not related to that.


Regards,
Armin

From g.brandl at gmx.net  Sat Aug 28 13:12:37 2010
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 28 Aug 2010 13:12:37 +0200
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <4C78EF4F.3070802@active-4.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>	<4C76FAC3.5010801@active-4.com>	<i5aqlm$5o4$1@dough.gmane.org>
	<4C78EF4F.3070802@active-4.com>
Message-ID: <i5ar4v$6j1$1@dough.gmane.org>

Am 28.08.2010 13:13, schrieb Armin Ronacher:
> Hi,
> 
> On 2010-08-28 1:04 PM, Georg Brandl wrote:
>> Let me just throw in here that it's *NOT* too late to do something about
>> Python 3.2.  It is not even in beta state yet, and I am very willing to
>> introduce the changes to make web programming work again, or even hold
>> up 3.2 for a bit if you need more time.
> Sorry if I was not clear.  I was talking about only wsgiref here.  And 
> for that to be adapted to a possible new WSGI specification we would 
> need more time than you can hold the 3.2 release I think.

That is certainly true :)

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From ianb at colorstudy.com  Mon Aug 30 03:02:02 2010
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 29 Aug 2010 20:02:02 -0500
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <i5ar4v$6j1$1@dough.gmane.org>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com> <i5aqlm$5o4$1@dough.gmane.org>
	<4C78EF4F.3070802@active-4.com> <i5ar4v$6j1$1@dough.gmane.org>
Message-ID: <AANLkTim9VE3uDFR+EeVbC3VF4mA5G3fYxvVUXD5RsrL-@mail.gmail.com>

Ugh... why are we back at bytes again?  I don't know of any concrete
problems with using Latin1 (basically how mod_wsgi works).  It would be nice
to try out some tricky cases -- cookie parsing, HTTP proxies,
output-modifying middleware, a few other cases.  But I don't see a reason to
expect they won't work.  It also doesn't feel particularly *wrong*.  The
parsed portions of the request and response are mostly ASCII anyway, and the
exceptions generally require wonky code anyway so a little transcoding isn't
so bad.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20100829/fa2b0724/attachment.html>

From graham.dumpleton at gmail.com  Mon Aug 30 03:16:37 2010
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Mon, 30 Aug 2010 11:16:37 +1000
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTim9VE3uDFR+EeVbC3VF4mA5G3fYxvVUXD5RsrL-@mail.gmail.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com> <i5aqlm$5o4$1@dough.gmane.org>
	<4C78EF4F.3070802@active-4.com> <i5ar4v$6j1$1@dough.gmane.org>
	<AANLkTim9VE3uDFR+EeVbC3VF4mA5G3fYxvVUXD5RsrL-@mail.gmail.com>
Message-ID: <AANLkTik3sr_=Zgg_TosOro4BEE8=gF2+nwTUfVn5JC9w@mail.gmail.com>

On 30 August 2010 11:02, Ian Bicking <ianb at colorstudy.com> wrote:
> Ugh... why are we back at bytes again?

Because no official decision, by way of a vote or even consensus, has
ever been made, the bytes option never goes away.

The problem with bytes, before one even tries to compare it to
text/unicode option, is that there is no clear description of what is
meant by the bytes option. For all I can see, there are potentially
multiple interpretations of what is meant by bytes.

Although I almost begged that if we are going to discuss bytes,
compared to text/unicode, that agreement at least first be made about
the definition of the bytes leaning option, that request has pretty
well fallen on death ears. Thus the discussion yet again is going the
direction of just dithering with a lot of navel gazing and not much
else.

As I brought up almost two years ago, if we are going to make any
progress on this, we are probably going to have a core group of people
nominated who can officially make the decision of what is done based
on a proper vote. This will be the only way there is going to be any
sort of acceptance of a decision. This idea that we can reach a
consensus just isn't working.

Graham

> I don't know of any concrete
> problems with using Latin1 (basically how mod_wsgi works).? It would be nice
> to try out some tricky cases -- cookie parsing, HTTP proxies,
> output-modifying middleware, a few other cases.? But I don't see a reason to
> expect they won't work.? It also doesn't feel particularly *wrong*.? The
> parsed portions of the request and response are mostly ASCII anyway, and the
> exceptions generally require wonky code anyway so a little transcoding isn't
> so bad.
>
> --
> Ian Bicking? |? http://blog.ianbicking.org
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
>

From pje at telecommunity.com  Mon Aug 30 05:07:49 2010
From: pje at telecommunity.com (P.J. Eby)
Date: Sun, 29 Aug 2010 23:07:49 -0400
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTik3sr_=Zgg_TosOro4BEE8=gF2+nwTUfVn5JC9w@mail.gmail.c
 om>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com> <i5aqlm$5o4$1@dough.gmane.org>
	<4C78EF4F.3070802@active-4.com> <i5ar4v$6j1$1@dough.gmane.org>
	<AANLkTim9VE3uDFR+EeVbC3VF4mA5G3fYxvVUXD5RsrL-@mail.gmail.com>
	<AANLkTik3sr_=Zgg_TosOro4BEE8=gF2+nwTUfVn5JC9w@mail.gmail.com>
Message-ID: <20100830030802.747023A4100@sparrow.telecommunity.com>

At 11:16 AM 8/30/2010 +1000, Graham Dumpleton wrote:
>Although I almost begged that if we are going to discuss bytes,
>compared to text/unicode, that agreement at least first be made about
>the definition of the bytes leaning option, that request has pretty
>well fallen on death ears.

Did you not see my reply?  I (thought I) answered your question, and 
I actually also suggested that a variation of your unicode proposal 
might work, too.  See:

http://mail.python.org/pipermail/web-sig/2010-August/004545.html


From graham.dumpleton at gmail.com  Mon Aug 30 06:37:14 2010
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Mon, 30 Aug 2010 14:37:14 +1000
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <20100830030802.747023A4100@sparrow.telecommunity.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com> <i5aqlm$5o4$1@dough.gmane.org>
	<4C78EF4F.3070802@active-4.com> <i5ar4v$6j1$1@dough.gmane.org>
	<AANLkTim9VE3uDFR+EeVbC3VF4mA5G3fYxvVUXD5RsrL-@mail.gmail.com>
	<AANLkTik3sr_=Zgg_TosOro4BEE8=gF2+nwTUfVn5JC9w@mail.gmail.com>
	<20100830030802.747023A4100@sparrow.telecommunity.com>
Message-ID: <AANLkTikvGiRxKPG91OZppWs8SqR7spd7rddKvbya87M7@mail.gmail.com>

On 30 August 2010 13:07, P.J. Eby <pje at telecommunity.com> wrote:
> At 11:16 AM 8/30/2010 +1000, Graham Dumpleton wrote:
>>
>> Although I almost begged that if we are going to discuss bytes,
>> compared to text/unicode, that agreement at least first be made about
>> the definition of the bytes leaning option, that request has pretty
>> well fallen on death ears.
>
> Did you not see my reply? ?I (thought I) answered your question, and I
> actually also suggested that a variation of your unicode proposal might
> work, too. ?See:
>
> http://mail.python.org/pipermail/web-sig/2010-August/004545.html

I was purely asking about bytes, what that means to people who want to
push that, and set aside the unicode one for the moment.

There have been others as well in the past who have pushed bytes, but
they haven't said anything about what it means and I really wanted
more input given that in the past the discussions had over the unicode
leaning proposals between us core people have been in part derailed by
these people who sit mostly on the sidelines and start shouting 'I
want bytes instead'. So, I want to give those critics their chance to
confirm what they mean by bytes, else we will keep having them pop up
time and time again when we are trying to discuss other stuff. So it
is the lack of response beyond the usual suspects that am grumpy
about.

Even in what you mention about bytes you are a bit fuzzy. Having value
of wsgi.url_scheme be bytes is reasonable and have no issue with that
given that other URL components will be bytes as well, but when you
yourself mention keys, you are a bit unsure because of the 'b' plague.
So, still no clarity on that point and if people are going to keep
raising bytes, would like that better definition of what they are
talking about.

The only other person who has said anything about bytes is Armin but
all that he really said was 'all bytes only'. This isn't much clearer
than when people have in the past said 'bytes everywhere', but in some
cases didn't actually mean keys. This is why I asked that people cut
and paste the definition I gave and change it to exactly what they
meant, so not having to second guess. FWIW, from separate discussion
understand Armin does mean bytes for keys.

So, was really after that clarity so we can say without confusion that
our starting point from now is that have two overall proposals and
that they be A and B as defined, with possibly even a C and D if need
be, not even using the labels bytes and unicode. We can then discuss
each in isolation as to whether as defined they would work or not.
>From that one or more might die, or might mutate further and actually
become closer to the other option but where all are still valid
options. Either way, people up till now have it stuck in their heads
now this bytes vs unicode divide when strictly speaking it isn't
necessarily pure bytes vs pure unicode, but merely a number of
different proposals with certain bits in one case using unicode
instead of bytes.

Given that we have dedicated most time to the unicode leaning
solution, would like to go and look properly at the bytes leaning
solutions now. That way we have the definitions and also have done the
analysis and when people come along later and say 'bytes everywhere',
we have something proper to refer back to about it.

Anyway, rather than keep arguing the point and move forward, let us
perhaps start now with the following definitions and new names to
identify them. We can even go a bit stupid and give each its own code
name so they are in part more memorable. Any next option based on your
suggestions about changing the WHEAT option can be called MAIZE. And
if you thinking I am going stark raving mad and should be put in a
white jacket and locked up, you could well be right. I am not a happy
camper right now, but that is because of many things besides this WSGI
stuff. :-)

 And yes I know about the page that has been just recently put up at:

  http://www.wsgi.org/wsgi/Python_3

>From memory when I first read it I wasn't sure if that it was
completely accurate, but at least it doesn't now mention mod_python
instead of mod_wsgi which was mighty confusing. We can perhaps merge
the following into that page, ie., expand the table, and talk more
about the abstract definitions rather than linking it to specific
implementations at this point. We can perhaps then start capturing the
pros and cons against each option in the page rather than loosing them
in the email chain.

OPTION : BARLEY

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are byte strings.

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a byte string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

OPTION : RYE

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables.

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a byte string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

OPTION : WHEAT

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are native strings. Where native strings are unicode
strings, ISO-8859-1 encoding would be used such that the original
character data is preserved and as necessary the unicode string can be
converted back to bytes and thence decoded to unicode again using a
different encoding.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application should be a byte
string. Where native strings are unicode strings, the native string
type can also be returned in which case it would be encoded as
ISO-8859-1.

6. The list of response headers specified by the WSGI application
should contain tuples consisting of two values, where each value is a
byte string. Where native strings are unicode strings, the native
string type can also be returned in which case it would be encoded as
ISO-8859-1.

7. The iterable returned by the application and from which response
content is derived, should yield byte strings. Where native strings
are unicode strings, the native string type can also be returned in
which case it would be encoded as ISO-8859-1.

Graham

From ianb at colorstudy.com  Mon Aug 30 18:00:28 2010
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 30 Aug 2010 11:00:28 -0500
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTikvGiRxKPG91OZppWs8SqR7spd7rddKvbya87M7@mail.gmail.com>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com> <i5aqlm$5o4$1@dough.gmane.org>
	<4C78EF4F.3070802@active-4.com> <i5ar4v$6j1$1@dough.gmane.org>
	<AANLkTim9VE3uDFR+EeVbC3VF4mA5G3fYxvVUXD5RsrL-@mail.gmail.com>
	<AANLkTik3sr_=Zgg_TosOro4BEE8=gF2+nwTUfVn5JC9w@mail.gmail.com>
	<20100830030802.747023A4100@sparrow.telecommunity.com>
	<AANLkTikvGiRxKPG91OZppWs8SqR7spd7rddKvbya87M7@mail.gmail.com>
Message-ID: <AANLkTin2CmF=sFgssf6WZ3bBnB8oZXRnUoJM79ji60=F@mail.gmail.com>

Just to narrow in on one case, URLs, there are a few pieces of information
that make up the URL:

wsgi.url_scheme: this is *not* present in the request, it's inferred somehow
(e.g., by the port the client connected to)

HTTP_HOST: this is a header.  It typically contains both the hostname and
the port.  The encoding is generally idna, though you have to split the port
off first.  The unicode version of the hostname is not widely supported in
client libraries (it's usually applied at the UI level).

SCRIPT_NAME/PATH_INFO: these represent a portion of the request path (before
?).  As submitted these are generally ASCII (URL-quoted).  After unquoting,
they are typically UTF-8, but may be of any or no encoding.  If an unsafe
character is present in the URL-quoted version of the path, it may be quoted
at the byte level.  The '?' character is effectively a byte-oriented marker
and encodings cannot affect it.

QUERY_STRING: this is also generally ASCII (URL-quoted).  Unsafe characters
could be quoted at the byte level.

Generally I'm unaware of any reasonable situation where quoting unsafe
characters in an HTTP request would be improper, or even lose any meaningful
information.  Mostly because I don't know of any clients that actually would
expect unsafe characters to work.  Quoting HTTP_HOST is difficult, as it's
not a byte-oriented quoting, it's a fairly complex encoding.  But I'm also
not sure where in a stack you could actually handle unsafe characters in
HTTP_HOST -- it seems like simply an invalid request, and deferring the
error won't give another part of the stack the opportunity to do the right
thing.

In their quoted form all these values (at least including the quoted path,
not the unquoted SCRIPT_NAME/PATH_INFO) *should* be ASCII, and I believe a
WSGI server could ensure they were all ASCII without any loss of useful
information (either by simply rejecting the request or by applying
quoting).  I don't see any place where bytes are advantageous.  Representing
invalid requests does not seem particularly helpful -- *some* invalid
requests are useful to handle (e.g., weird cookies) but in the case of the
URL variables I don't see any benefit.

IMHO all the tricky encoding issues are in the request and response bodies,
and I'm pretty sure we have consensus that those should be bytes.

Reiterating other encoding issues I'm aware of:

Cookie encodings, but parsing cookies as bytes or Latin1 is basically
equivalent, and I don't believe that, for instance, they should ever be
parsed as UTF-8.  Parsing as bytes might avoid an unnecessary
encoding/decoding, but it's all tricky enough that libraries should do it
anyway, and the encoding overhead alone isn't very important.

Another example is the Atom Title header (
http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-08.html#rfc.section.8.1.2)
but that's supposed to be Latin1 with RFC2047 encodings, and I don't believe
anyone is proposing that RFC2047 encodings be handled generally at the WSGI
layer (I think CherryPy does or used to handle these, but there were many
objections at least on this list about it, in part due to security
concerns).  A 2047 encoding is like "Title:
=?utf-8?q?stuff-with=-escaping?=".

Response headers are equivalent to request headers.  Response status is
constrained by the spec to Latin1, and there are no use cases I know of
(even really obscure ones) where it would be necessary to use other
encodings.

And that's it!  HTTP has a fairly finite amount of surface area.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20100830/fd522e0b/attachment.html>

From pje at telecommunity.com  Mon Aug 30 18:26:48 2010
From: pje at telecommunity.com (P.J. Eby)
Date: Mon, 30 Aug 2010 12:26:48 -0400
Subject: [Web-SIG] WSGI for Python 3
In-Reply-To: <AANLkTikvGiRxKPG91OZppWs8SqR7spd7rddKvbya87M7@mail.gmail.c
 om>
References: <AANLkTil1FkTJ8IXtIEXOlGoa_fh0AjO7_nyxy_pwXeUd@mail.gmail.com>
	<4C76FAC3.5010801@active-4.com> <i5aqlm$5o4$1@dough.gmane.org>
	<4C78EF4F.3070802@active-4.com> <i5ar4v$6j1$1@dough.gmane.org>
	<AANLkTim9VE3uDFR+EeVbC3VF4mA5G3fYxvVUXD5RsrL-@mail.gmail.com>
	<AANLkTik3sr_=Zgg_TosOro4BEE8=gF2+nwTUfVn5JC9w@mail.gmail.com>
	<20100830030802.747023A4100@sparrow.telecommunity.com>
	<AANLkTikvGiRxKPG91OZppWs8SqR7spd7rddKvbya87M7@mail.gmail.com>
Message-ID: <20100830162702.650A23A40A5@sparrow.telecommunity.com>

At 02:37 PM 8/30/2010 +1000, Graham Dumpleton wrote:
>Anyway, rather than keep arguing the point and move forward, let us
>perhaps start now with the following definitions and new names to
>identify them. We can even go a bit stupid and give each its own code
>name so they are in part more memorable. Any next option based on your
>suggestions about changing the WHEAT option can be called MAIZE. And
>if you thinking I am going stark raving mad and should be put in a
>white jacket and locked up, you could well be right. I am not a happy
>camper right now, but that is because of many things besides this WSGI
>stuff. :-)
>
>  And yes I know about the page that has been just recently put up at:
>
>   http://www.wsgi.org/wsgi/Python_3
>
> From memory when I first read it I wasn't sure if that it was
>completely accurate, but at least it doesn't now mention mod_python
>instead of mod_wsgi which was mighty confusing. We can perhaps merge
>the following into that page, ie., expand the table, and talk more
>about the abstract definitions rather than linking it to specific
>implementations at this point. We can perhaps then start capturing the
>pros and cons against each option in the page rather than loosing them
>in the email chain.

I've added a column to the page called "flat" that captures my 
current proposal (native keys, surrogateescape values, byte stream 
in, strict bytes-only for all outputs).  This seems to me an optimum 
balance between:

* Verifiability (especially *composable* verifiability)
* Low cognitive overhead (i.e., fewest things to remember)
* Low amount of finger-typing and fewer conversions

But I certainly could be convinced otherwise by example or argument.

(One other thing I consider a plus for this approach, btw: os.environ 
is still largely usable as a WSGI environ in the CGI case.  This 
isn't so much a valuable thing in itself, as that it's an indicator 
of low complexity and cognitive overhead.)