From davidgshi at yahoo.co.uk Tue Aug 3 14:41:59 2010 From: davidgshi at yahoo.co.uk (David Shi) Date: Tue, 3 Aug 2010 12:41:59 +0000 (GMT) Subject: [Web-SIG] WAP communicating with server-side Python Message-ID: <536331.71732.qm@web26304.mail.ukl.yahoo.com> Is there an equivalent mailling list for WAP? I am in need of a?very simple demo?website/webpage accessible by mobile handset and simply get a few critical data, e.g. its id, or/and x, y, z?position. http://www.google.co.uk/search?hl=en&q=how+to+mobile+website&aq=8&aqi=g10&aql=&oq=how+to+mobile&gs_rfai= I want to try out moving my Python internet service on to mobile phones.? Therefore, the front-end will have to be in WAP.? All I need is an excellent demo WAP page to show me how to get the mobile handset's ID, and x, y, z position. Regards. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From armin.ronacher at active-4.com Fri Aug 27 01:37:39 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Fri, 27 Aug 2010 01:37:39 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: Message-ID: <4C76FAC3.5010801@active-4.com> Hi, Is there a status update on that now I missed? Did something decide on bytes for the environment values or are we still unsure about that? From a discussion lately I had with Graham on #pocoo it seems like he lost interest on supporting WSGI on Python 3 for the time being due to lack of interest. My personal pet project of actively redesigning WSGI to see if a higher-level protocol would solve the unicode issue better failed and was not worth the effort. As I understand Python 3.0/1/2 will be broken for WSGI anyways so we can stop caring about the stdlib. CherryPy seems to be the only system currently with an actively maintained Python 3 version of WSGI which from my understanding is based on unicode and bytes, where unicode is seen as latin1. At that point I don't care at all about what is decided on as long as something is decided. Can someone please stand up and just do that? :) Regards, Armin From pje at telecommunity.com Fri Aug 27 05:45:51 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 26 Aug 2010 23:45:51 -0400 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <4C76FAC3.5010801@active-4.com> References: <4C76FAC3.5010801@active-4.com> Message-ID: <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote: >Hi, > >Is there a status update on that now I missed? Did something decide >on bytes for the environment values or are we still unsure about that? To the extent we're "unsure", I think the holdup is simply that nobody has tried doing an all-bytes WSGI implementation -- unless of course you count all our Python 2.x experience as experience with an all-bytes implementation. ;-) (Of course, that experience won't help us with Python 3 stdlib issues.) >At that point I don't care at all about what is decided on as long >as something is decided. Can someone please stand up and just do that? :) Essentially the problem right now is that unless such a choice is made, there's little hope of getting the stdlib issues to be resolved, because we can't exactly file bug reports against the stdlib if we don't know what we want it to do. ;-) My personal inclination is to define WSGI 2 as a bytes-oriented protocol, and then encourage people to port to WSGI 2 before moving to Python 3. In theory, if we did it correctly it could actually minimize the porting pain for Python 3. In practice, I'm not sure how to do this, as I lack experience with 2to3 at the moment, or any production experience with Python 3 whatsoever. From graham.dumpleton at gmail.com Fri Aug 27 06:17:09 2010 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 27 Aug 2010 14:17:09 +1000 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> Message-ID: On 27 August 2010 13:45, P.J. Eby wrote: > At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote: >> >> Hi, >> >> Is there a status update on that now I missed? ?Did something decide on >> bytes for the environment values or are we still unsure about that? > > To the extent we're "unsure", I think the holdup is simply that nobody has > tried doing an all-bytes WSGI implementation -- unless of course you count > all our Python 2.x experience as experience with an all-bytes > implementation. ?;-) > > (Of course, that experience won't help us with Python 3 stdlib issues.) > > >> At that point I don't care at all about what is decided on as long as >> something is decided. ?Can someone please stand up and just do that? :) > > Essentially the problem right now is that unless such a choice is made, > there's little hope of getting the stdlib issues to be resolved, because we > can't exactly file bug reports against the stdlib if we don't know what we > want it to do. ?;-) > > My personal inclination is to define WSGI 2 as a bytes-oriented protocol, > and then encourage people to port to WSGI 2 before moving to Python 3. Since the major stumbling block, irrespective of other changes, to any sort of agreement is still bytes vs unicode, and where we have a reasonable clear definition of what unicode suggestion is, can we please as a first step get a definition of what bytes actually implies so everyone knows what we are talking about. I specifically ask this, as it isn't clear because people don't explain in detail what they mean when they are saying 'bytes'. Going back to my definition #2 in my blog post from a year ago, I had: 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are native strings. For CGI variables, all names are going to be ISO-8859-1 and so where native strings are unicode strings, that encoding is used for the names of CGI variables 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a native string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are byte strings. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application must be a byte string. 6. The list of response headers specified by the WSGI application must contain tuples consisting of two values, where each value is a byte string. 7. The iterable returned by the application and from which response content is derived, must yield byte strings. The points of disagreement I have seen about this is are as follows. For (1), the keys should also be bytes, including names of 'wsgi.' special keys. For (2), the value of 'wsgi.url_scheme' should be bytes. So, do you really want bytes absolutely everywhere, or are keys still going to be unicode taken as ISO-8859-1. Note that we are not agreeing to the final solution here, just what bytes means in contrast to the unicode option, so we know that we are comparing only two options and not many options because people have different interpretations of what bytes means. As contrast, what we generally mean by the unicode option is definition #3 from my blog post. That being: 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are native strings. For CGI variables, all names are going to be ISO-8859-1 and so where native strings are unicode strings, that encoding is used for the names of CGI variables 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a native string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are native strings. Where native strings are unicode strings, ISO-8859-1 encoding would be used such that the original character data is preserved and as necessary the unicode string can be converted back to bytes and thence decoded to unicode again using a different encoding. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application should be a byte string. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. 6. The list of response headers specified by the WSGI application should contain tuples consisting of two values, where each value is a byte string. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. 7. The iterable returned by the application and from which response content is derived, should yield byte strings. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. Even though call it unicode, it actually has bytes in places as well. The key issues over bytes vs unicode has been in values in the dictionary, but as pointed out about, not clear whether for bytes option, we are talking about bytes for keys as well and for value of 'wsgi.url_scheme'. So, can we can clarify this first. And if you are going to comment, for that extra clarity, cut and paste my definition #2 above and make the changes to it so we have the full definition, rather than just referring to bits. That way people who come and read this don't have to troll through the whole email chain to derive the context. Once we get that clarification, then we can perhaps discuss exclusively any issues people have with that bytes definition. That is before we even try to balance it against the unicode option or look at other WSGI 2 changes such as dropping start_response and wsgi.file_wrapper. And I apologise in advance if I start getting cranky and people think I am trying to hijack the conversation. I want a solution more so than probably anyone else as I can't fix up mod_wsgi until there is and right now am I feeling pretty unmotivated towards doing anything with mod_wsgi at all, even non Python 3.X enhancements because of all this. So, if we can keep focus and try going one step at a time, maybe I will not got ballistic. ;-) Graham From armin.ronacher at active-4.com Fri Aug 27 16:22:47 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Fri, 27 Aug 2010 16:22:47 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> Message-ID: <4C77CA37.1050603@active-4.com> Hi, On 2010-08-27 5:45 AM, P.J. Eby wrote: > At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote: > To the extent we're "unsure", I think the holdup is simply that nobody > has tried doing an all-bytes WSGI implementation -- unless of course you > count all our Python 2.x experience as experience with an all-bytes > implementation. ;-) I have a private branch of Werkzeug that is all bytes only. Untested unfortunately because porting the testsuite over is a huge task on its own and not all parts work properly yet. But it's okayish. Werkzeug does not use anything from the standard library in the latest version except urljoin from the url parse package which I would have to rewrite for my little experiment. In my attempt to port it I'm doing the encode/decode dance in a wrapper function. > In theory, if we did it correctly it could actually minimize the porting > pain for Python 3. > > In practice, I'm not sure how to do this, as I lack experience with 2to3 > at the moment, or any production experience with Python 3 whatsoever. The big problem for me is that we *will* have to run to 2to3 because WSGI sometimes leaks from the framework to the application. This is especially true for Django where request.META is directly passed as WSGI environment to the user and no accessor functions exist. So everybody and is parsing the headers themselves there. So when frameworks are starting to support any version of WSGI on Python 3 they will also have to ship custom 2to3 fixers that add tiny shims for decoding/encoding either side of comparisons etc. For example it's pretty common to see stuff like this: if 'msie' in request.META.get('HTTP_USER_AGENT', '').lower(): For an all bytes approach a tool would have to recognize that this is from a WSGI environment and change the code to this: if b'msie' in request.META.get('HTTP_USER_AGENT', b'').lower(): That's not impossible to do and in my mind the right decision, but it also means extra work to be done. And if extra work is required when porting a framework and application over to Python 3 we could reward the people doing that with improvements of the specification itself. I'm thinking about improving file_wrapper (so that middlewares can either detect that a file_wrapper is here and they should not consume the app iter, or just replacing it with a custom header), the input stream etc. Regards, Armin From cito at online.de Fri Aug 27 18:05:15 2010 From: cito at online.de (Christoph Zwerschke) Date: Fri, 27 Aug 2010 18:05:15 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <4C77CA37.1050603@active-4.com> References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> <4C77CA37.1050603@active-4.com> Message-ID: <4C77E23B.7090108@online.de> Am 27.08.2010 16:22 schrieb Armin Ronacher: > For an all bytes approach a tool would have to recognize that this is > from a WSGI environment and change the code to this: > > if b'msie' in request.META.get('HTTP_USER_AGENT', b'').lower(): Btw, another problem with this is that the lower() method does not know that it has to use latin1 when lowercasing. For instance, user = '?zkan'.encode('latin1') if user in request.META.get('REMOTE_USER', b'').lower(): will not work it the user has logged in as '?zkan'. -- Christoph From pje at telecommunity.com Fri Aug 27 18:27:08 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 27 Aug 2010 12:27:08 -0400 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <4C77E23B.7090108@online.de> References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> <4C77CA37.1050603@active-4.com> <4C77E23B.7090108@online.de> Message-ID: <20100827162719.746463A409E@sparrow.telecommunity.com> At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote: > For instance, > >user = '?zkan'.encode('latin1') >if user in request.META.get('REMOTE_USER', b'').lower(): > >will not work it the user has logged in as '?zkan'. Isn't that a problem with code that does this now? From pje at telecommunity.com Fri Aug 27 19:01:56 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 27 Aug 2010 13:01:56 -0400 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> Message-ID: <20100827170206.5D0293A409E@sparrow.telecommunity.com> At 02:17 PM 8/27/2010 +1000, Graham Dumpleton wrote: >Since the major stumbling block, irrespective of other changes, to any >sort of agreement is still bytes vs unicode, and where we have a >reasonable clear definition of what unicode suggestion is, can we >please as a first step get a definition of what bytes actually implies >so everyone knows what we are talking about. I specifically ask this, >as it isn't clear because people don't explain in detail what they >mean when they are saying 'bytes'. > >Going back to my definition #2 in my blog post from a year ago, I had: > >1. The application is passed an instance of a Python dictionary >containing what is referred to as the WSGI environment. All keys in >this dictionary are native strings. For CGI variables, all names are >going to be ISO-8859-1 and so where native strings are unicode >strings, that encoding is used for the names of CGI variables FYI, one thing that's changed here is the existence of os.environb in Python 3.2, at least on non-Windows OSes. >2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI >environment, the value of the variable should be a native string. Since any meaningful use of this value is going to end up needing to be bytes again (e.g. Location headers), and for consistency's sake, I lean towards saying this is bytes too. >3. For the CGI variables contained in the WSGI environment, the values >of the variables are byte strings. > >4. The WSGI input stream 'wsgi.input' contained in the WSGI >environment and from which request content is read, should yield byte >strings. > >5. The status line specified by the WSGI application must be a byte string. > >6. The list of response headers specified by the WSGI application must >contain tuples consisting of two values, where each value is a byte >string. > >7. The iterable returned by the application and from which response >content is derived, must yield byte strings. > >The points of disagreement I have seen about this is are as follows. > >For (1), the keys should also be bytes, including names of 'wsgi.' >special keys. > >For (2), the value of 'wsgi.url_scheme' should be bytes. > >So, do you really want bytes absolutely everywhere, or are keys still >going to be unicode taken as ISO-8859-1. If we follow the example of os.environb, then the keys have to be bytes also. However, I can already see that the big problem with all of this is that WSGI code is going to be littered with a plague of "b"s hanging off the front of every string literal, and that 2to3 is probably not going to handle it correctly. Making the keys bytes as well just multiplies the problem. >Note that we are not agreeing to the final solution here, just what >bytes means in contrast to the unicode option, so we know that we are >comparing only two options and not many options because people have >different interpretations of what bytes means. > >As contrast, what we generally mean by the unicode option is >definition #3 from my blog post. That being: > >1. The application is passed an instance of a Python dictionary >containing what is referred to as the WSGI environment. All keys in >this dictionary are native strings. For CGI variables, all names are >going to be ISO-8859-1 and so where native strings are unicode >strings, that encoding is used for the names of CGI variables > >2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI >environment, the value of the variable should be a native string. > >3. For the CGI variables contained in the WSGI environment, the values >of the variables are native strings. Where native strings are unicode >strings, ISO-8859-1 encoding would be used such that the original >character data is preserved and as necessary the unicode string can be >converted back to bytes and thence decoded to unicode again using a >different encoding. > >4. The WSGI input stream 'wsgi.input' contained in the WSGI >environment and from which request content is read, should yield byte >strings. > >5. The status line specified by the WSGI application should be a byte >string. Where native strings are unicode strings, the native string >type can also be returned in which case it would be encoded as >ISO-8859-1. > >6. The list of response headers specified by the WSGI application >should contain tuples consisting of two values, where each value is a >byte string. Where native strings are unicode strings, the native >string type can also be returned in which case it would be encoded as >ISO-8859-1. > >7. The iterable returned by the application and from which response >content is derived, should yield byte strings. Where native strings >are unicode strings, the native string type can also be returned in >which case it would be encoded as ISO-8859-1. > >Even though call it unicode, it actually has bytes in places as well. >The key issues over bytes vs unicode has been in values in the >dictionary, but as pointed out about, not clear whether for bytes >option, we are talking about bytes for keys as well and for value of >'wsgi.url_scheme'. The main issue I have with this option is that it seems to make it trivially easy to write an app or piece of middleware that seems to work correctly most of the time, unless placed in the right combination with other apps or middleware. More precisely, an updated wsgiref.validate module used to check the "unicode option" would mark such apps and middleware as perfectly spec-conformant, yet this spec-conformance would not be transitive - i.e., you couldn't say that an assembly of spec-conformant middleware and apps would be correct. Hmmm... unless... I guess the only way to be really sure would be if the validation process randomly changed the types of input and output values to both ways allowed by the spec, and verified that the results were still compliant. ;-) (In practice, I expect that getting it to do that would be rather difficult, though.) Let me see if I can more precisely narrow down my concern. Mostly, it boils down to the possibility of non-latin1 unicode "escaping" into the output stream... so if #5, #6 and #7 above were changed to bytes-only outputs, then an updated validator can enforce those criteria, making spec-compliance verification composable. (That is, if you combine two things that are verified compliant, the combination is also known to be compliant.) So, I could actually support a format that was "unicode (latin1) headers in, bytes headers out", and "bytes stream in, bytes stream out". You can then concentrate all your encoding or decoding operations at one place, or even write a decorator to take care of it for you. >So, can we can clarify this first. And if you are going to comment, >for that extra clarity, cut and paste my definition #2 above and make >the changes to it so we have the full definition, rather than just >referring to bits. That way people who come and read this don't have >to troll through the whole email chain to derive the context. > >Once we get that clarification, then we can perhaps discuss >exclusively any issues people have with that bytes definition. That is >before we even try to balance it against the unicode option or look at >other WSGI 2 changes such as dropping start_response and >wsgi.file_wrapper. > >And I apologise in advance if I start getting cranky and people think >I am trying to hijack the conversation. I want a solution more so than >probably anyone else as I can't fix up mod_wsgi until there is and >right now am I feeling pretty unmotivated towards doing anything with >mod_wsgi at all, even non Python 3.X enhancements because of all this. >So, if we can keep focus and try going one step at a time, maybe I >will not got ballistic. ;-) Thanks for hanging in there, and also for posting this summary! From cito at online.de Fri Aug 27 19:08:43 2010 From: cito at online.de (Christoph Zwerschke) Date: Fri, 27 Aug 2010 19:08:43 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <20100827162719.746463A409E@sparrow.telecommunity.com> References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> <4C77CA37.1050603@active-4.com> <4C77E23B.7090108@online.de> <20100827162719.746463A409E@sparrow.telecommunity.com> Message-ID: <4C77F11B.2020907@online.de> Am 27.08.2010 18:27 schrieb P.J. Eby: > At 06:05 PM 8/27/2010 +0200, Christoph Zwerschke wrote: >> user = '?zkan'.encode('latin1') >> if user = request.META.get('REMOTE_USER', b'').lower(): >> >> will not work it the user has logged in as '?zkan'. > > Isn't that a problem with code that does this now? You mean in Python 2? If the locale is set properly, lower() will account for non-ascii. I don't think Python 3 does this with bytes. -- Christoph From paul.joseph.davis at gmail.com Fri Aug 27 21:26:53 2010 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Fri, 27 Aug 2010 15:26:53 -0400 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> Message-ID: On Fri, Aug 27, 2010 at 12:17 AM, Graham Dumpleton wrote: > On 27 August 2010 13:45, P.J. Eby wrote: >> At 01:37 AM 8/27/2010 +0200, Armin Ronacher wrote: >>> >>> Hi, >>> >>> Is there a status update on that now I missed? ?Did something decide on >>> bytes for the environment values or are we still unsure about that? >> >> To the extent we're "unsure", I think the holdup is simply that nobody has >> tried doing an all-bytes WSGI implementation -- unless of course you count >> all our Python 2.x experience as experience with an all-bytes >> implementation. ?;-) >> >> (Of course, that experience won't help us with Python 3 stdlib issues.) >> >> >>> At that point I don't care at all about what is decided on as long as >>> something is decided. ?Can someone please stand up and just do that? :) >> >> Essentially the problem right now is that unless such a choice is made, >> there's little hope of getting the stdlib issues to be resolved, because we >> can't exactly file bug reports against the stdlib if we don't know what we >> want it to do. ?;-) >> >> My personal inclination is to define WSGI 2 as a bytes-oriented protocol, >> and then encourage people to port to WSGI 2 before moving to Python 3. > > Since the major stumbling block, irrespective of other changes, to any > sort of agreement is still bytes vs unicode, and where we have a > reasonable clear definition of what unicode suggestion is, can we > please as a first step get a definition of what bytes actually implies > so everyone knows what we are talking about. I specifically ask this, > as it isn't clear because people don't explain in detail what they > mean when they are saying 'bytes'. > > Going back to my definition #2 in my blog post from a year ago, I had: > > 1. The application is passed an instance of a Python dictionary > containing what is referred to as the WSGI environment. All keys in > this dictionary are native strings. For CGI variables, all names are > going to be ISO-8859-1 and so where native strings are unicode > strings, that encoding is used for the names of CGI variables > > 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI > environment, the value of the variable should be a native string. > > 3. For the CGI variables contained in the WSGI environment, the values > of the variables are byte strings. > > 4. The WSGI input stream 'wsgi.input' contained in the WSGI > environment and from which request content is read, should yield byte > strings. > > 5. The status line specified by the WSGI application must be a byte string. > > 6. The list of response headers specified by the WSGI application must > contain tuples consisting of two values, where each value is a byte > string. > > 7. The iterable returned by the application and from which response > content is derived, must yield byte strings. > > The points of disagreement I have seen about this is are as follows. > > For (1), the keys should also be bytes, including names of 'wsgi.' special keys. > > For (2), the value of 'wsgi.url_scheme' should be bytes. > > So, do you really want bytes absolutely everywhere, or are keys still > going to be unicode taken as ISO-8859-1. > > Note that we are not agreeing to the final solution here, just what > bytes means in contrast to the unicode option, so we know that we are > comparing only two options and not many options because people have > different interpretations of what bytes means. > > As contrast, what we generally mean by the unicode option is > definition #3 from my blog post. That being: > > 1. The application is passed an instance of a Python dictionary > containing what is referred to as the WSGI environment. All keys in > this dictionary are native strings. For CGI variables, all names are > going to be ISO-8859-1 and so where native strings are unicode > strings, that encoding is used for the names of CGI variables > > 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI > environment, the value of the variable should be a native string. > > 3. For the CGI variables contained in the WSGI environment, the values > of the variables are native strings. Where native strings are unicode > strings, ISO-8859-1 encoding would be used such that the original > character data is preserved and as necessary the unicode string can be > converted back to bytes and thence decoded to unicode again using a > different encoding. > > 4. The WSGI input stream 'wsgi.input' contained in the WSGI > environment and from which request content is read, should yield byte > strings. > > 5. The status line specified by the WSGI application should be a byte > string. Where native strings are unicode strings, the native string > type can also be returned in which case it would be encoded as > ISO-8859-1. > > 6. The list of response headers specified by the WSGI application > should contain tuples consisting of two values, where each value is a > byte string. Where native strings are unicode strings, the native > string type can also be returned in which case it would be encoded as > ISO-8859-1. > > 7. The iterable returned by the application and from which response > content is derived, should yield byte strings. Where native strings > are unicode strings, the native string type can also be returned in > which case it would be encoded as ISO-8859-1. > > Even though call it unicode, it actually has bytes in places as well. > The key issues over bytes vs unicode has been in values in the > dictionary, but as pointed out about, not clear whether for bytes > option, we are talking about bytes for keys as well and for value of > 'wsgi.url_scheme'. > > So, can we can clarify this first. And if you are going to comment, > for that extra clarity, cut and paste my definition #2 above and make > the changes to it so we have the full definition, rather than just > referring to bits. That way people who come and read this don't have > to troll through the whole email chain to derive the context. > > Once we get that clarification, then we can perhaps discuss > exclusively any issues people have with that bytes definition. That is > before we even try to balance it against the unicode option or look at > other WSGI 2 changes such as dropping start_response and > wsgi.file_wrapper. > > And I apologise in advance if I start getting cranky and people think > I am trying to hijack the conversation. I want a solution more so than > probably anyone else as I can't fix up mod_wsgi until there is and > right now am I feeling pretty unmotivated towards doing anything with > mod_wsgi at all, even non Python 3.X enhancements because of all this. > So, if we can keep focus and try going one step at a time, maybe I > will not got ballistic. ;-) > > Graham > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/paul.joseph.davis%40gmail.com > I ran into this while I was attempting to put together enough code to play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has deftly pointed out, its a pretty big pain in the rear. Specifically, if we specify that all keys in the environ dictionary are byte strings, then there's a noticeable amount of pain in trying to write code that runs on both platforms. I object to 2to3.py on religious grounds, so when I was implementing this I was doing so with code that would run unmodified on both 2 and 3. What I ran into is that if you want to support older than 2.6, all environ key lookups must be wrapped with a helper function. This makes code that uses the dict full of things like environ[b("wsgi.errors")].write(b("some message")) where b is a helper I wrote to convert to the right type for a given interpreter. And I'm still not sure how Jython works with strings. PEP 333 says its unicode only which makes me wonder how they would react to the bytes everywhere approach. I'm also not a big fan of automatically applying a default encoding to *any* of the bytes read in an HTTP request. After contemplating for awhile I came to the conclusion that header names are really part of the request itself, where as the other keys in the environ are metadata about the request. Having the two different types of data in the same space domain seemed to be the root of the problem. So I rearranged things so that there's an "http.headers" key that is a dictionary with byte strings for keys and values. I haven't managed to find any time to write a test suite for the spec I was toying with but I figure its far enough along that it might be interesting to someone. This code should be runnable on 2.5, 2.6 and 3.2. When I get back to working on it, my next goal was to figure out a way to write the test suite in a way that it could run on any implementation to test for compliance. Code is at: http://github.com/davisp/wsgiref2 Paul Davis From fumanchu at aminus.org Fri Aug 27 22:04:06 2010 From: fumanchu at aminus.org (Robert Brewer) Date: Fri, 27 Aug 2010 13:04:06 -0700 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com><20100827034601.51C3C3A40A4@sparrow.telecommunity.com> Message-ID: Paul Davis wrote: > > Since the major stumbling block, irrespective of other changes, > > to any sort of agreement is still bytes vs unicode > > I ran into this while I was attempting to put together enough code to > play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has > deftly pointed out, its a pretty big pain in the rear. > > Specifically, if we specify that all keys in the environ dictionary > are byte strings, then there's a noticeable amount of pain in trying > to write code that runs on both platforms. I object to 2to3.py on > religious grounds, so when I was implementing this I was doing so with > code that would run unmodified on both 2 and 3. Religion is what gets us into this mess. Pragmatism will get us out. We have two options: 1. Continue to try to write code that runs unmodified on Python 2 and 3, or that runs when 2to3 is applied. There is a morass of corner cases and state machines that behave differently depending on when you look at them lurking here. You can all see where that is getting us: nowhere. By the time you all discover how to write a spec that deals with all the pain points which 2to3 introduces, Python 2 will be dead and you will have wasted your time. 2. Write a Python 3 version of your code. Yes, it's more drudge work. Suck it up. To ameliorate that, make the Python 3 version the default as soon as possible. Deprecate the Python 2 branch. Backport features as necessary to the Python 2 branch (just as Python itself has been doing, if you notice). If you do that, we can write a WSGI for Python 3 now that doesn't suffer from any of the complexities of 2to3. Robert Brewer fumanchu at aminus.org From paul.joseph.davis at gmail.com Fri Aug 27 23:39:34 2010 From: paul.joseph.davis at gmail.com (Paul Davis) Date: Fri, 27 Aug 2010 17:39:34 -0400 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> Message-ID: On Fri, Aug 27, 2010 at 4:04 PM, Robert Brewer wrote: > Paul Davis wrote: >> > Since the major stumbling block, irrespective of other changes, >> > to any sort of agreement is still bytes vs unicode >> >> I ran into this while I was attempting to put together enough code to >> play with a wsgiref2 that ran on both 2.x and 3.x. As Graham has >> deftly pointed out, its a pretty big pain in the rear. >> >> Specifically, if we specify that all keys in the environ dictionary >> are byte strings, then there's a noticeable amount of pain in trying >> to write code that runs on both platforms. I object to 2to3.py on >> religious grounds, so when I was implementing this I was doing so with >> code that would run unmodified on both 2 and 3. > > Religion is what gets us into this mess. Pragmatism will get us out. We > have two options: > > ?1. Continue to try to write code that runs unmodified on Python 2 and > 3, or that runs when 2to3 is applied. There is a morass of corner cases > and state machines that behave differently depending on when you look at > them lurking here. You can all see where that is getting us: nowhere. By > the time you all discover how to write a spec that deals with all the > pain points which 2to3 introduces, Python 2 will be dead and you will > have wasted your time. > ?2. Write a Python 3 version of your code. Yes, it's more drudge work. > Suck it up. To ameliorate that, make the Python 3 version the default as > soon as possible. Deprecate the Python 2 branch. Backport features as > necessary to the Python 2 branch (just as Python itself has been doing, > if you notice). If you do that, we can write a WSGI for Python 3 now > that doesn't suffer from any of the complexities of 2to3. > > > Robert Brewer > fumanchu at aminus.org > No. What got us into this mess was the idea that it would be a good to silently type cast unicode objects into bytes. Perhaps I could've been more clear on avoiding 2to3 though. I wanted to avoid coding any of its oddities into a reference implementation because as you point out it's just a source of confusion. I'd like to point out that the code I posted works on both 2.x and 3.x. Its fairly easy to implement the backwards compatible code in Python. There's nothing near the level of requiring a branched/back-port strategy. Not to mention, a branched reference implementation is bit of a contradiction in terms. The hard part is figuring out a specification that doesn't suck when people try and implement it on multiple interpreters. Also, I think you're overestimating the rate at which people are going to be converting to Python 3. I still have people ask for Python 2.4 support. I wouldn't be the least bit surprised if there's a WSGI 3 before we deprecate 2.x support. HTH, Paul Davis From armin.ronacher at active-4.com Sat Aug 28 01:24:48 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Sat, 28 Aug 2010 01:24:48 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <4C77E23B.7090108@online.de> References: <4C76FAC3.5010801@active-4.com> <20100827034601.51C3C3A40A4@sparrow.telecommunity.com> <4C77CA37.1050603@active-4.com> <4C77E23B.7090108@online.de> Message-ID: <4C784940.6090805@active-4.com> Hi, On 2010-08-27 6:05 PM, Christoph Zwerschke wrote: > Btw, another problem with this is that the lower() method does not know > that it has to use latin1 when lowercasing. That is not a problem insofar that case insensitive HTTP tokens are limited to ASCII only. Regards, Armin From g.brandl at gmx.net Sat Aug 28 13:04:27 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 28 Aug 2010 13:04:27 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <4C76FAC3.5010801@active-4.com> References: <4C76FAC3.5010801@active-4.com> Message-ID: Am 27.08.2010 01:37, schrieb Armin Ronacher: > Hi, > > Is there a status update on that now I missed? Did something decide on > bytes for the environment values or are we still unsure about that? > > From a discussion lately I had with Graham on #pocoo it seems like he > lost interest on supporting WSGI on Python 3 for the time being due to > lack of interest. > > My personal pet project of actively redesigning WSGI to see if a > higher-level protocol would solve the unicode issue better failed and > was not worth the effort. > > As I understand Python 3.0/1/2 will be broken for WSGI anyways so we can > stop caring about the stdlib. Let me just throw in here that it's *NOT* too late to do something about Python 3.2. It is not even in beta state yet, and I am very willing to introduce the changes to make web programming work again, or even hold up 3.2 for a bit if you need more time. However, someone who actually *does* web programming has to do that, in other words, one of you. All I see is complaints that it will not work and one has to forget the stdlib. That is somewhat sad. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From armin.ronacher at active-4.com Sat Aug 28 13:13:19 2010 From: armin.ronacher at active-4.com (Armin Ronacher) Date: Sat, 28 Aug 2010 13:13:19 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> Message-ID: <4C78EF4F.3070802@active-4.com> Hi, On 2010-08-28 1:04 PM, Georg Brandl wrote: > Let me just throw in here that it's *NOT* too late to do something about > Python 3.2. It is not even in beta state yet, and I am very willing to > introduce the changes to make web programming work again, or even hold > up 3.2 for a bit if you need more time. Sorry if I was not clear. I was talking about only wsgiref here. And for that to be adapted to a possible new WSGI specification we would need more time than you can hold the 3.2 release I think. > However, someone who actually *does* web programming has to do that, in > other words, one of you. All I see is complaints that it will not work > and one has to forget the stdlib. That is somewhat sad. While I am not happy with the decisions of the stdlib for unicode in some parts, my mail was not related to that. Regards, Armin From g.brandl at gmx.net Sat Aug 28 13:12:37 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 28 Aug 2010 13:12:37 +0200 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <4C78EF4F.3070802@active-4.com> References: <4C76FAC3.5010801@active-4.com> <4C78EF4F.3070802@active-4.com> Message-ID: Am 28.08.2010 13:13, schrieb Armin Ronacher: > Hi, > > On 2010-08-28 1:04 PM, Georg Brandl wrote: >> Let me just throw in here that it's *NOT* too late to do something about >> Python 3.2. It is not even in beta state yet, and I am very willing to >> introduce the changes to make web programming work again, or even hold >> up 3.2 for a bit if you need more time. > Sorry if I was not clear. I was talking about only wsgiref here. And > for that to be adapted to a possible new WSGI specification we would > need more time than you can hold the 3.2 release I think. That is certainly true :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ianb at colorstudy.com Mon Aug 30 03:02:02 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 29 Aug 2010 20:02:02 -0500 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <4C78EF4F.3070802@active-4.com> Message-ID: Ugh... why are we back at bytes again? I don't know of any concrete problems with using Latin1 (basically how mod_wsgi works). It would be nice to try out some tricky cases -- cookie parsing, HTTP proxies, output-modifying middleware, a few other cases. But I don't see a reason to expect they won't work. It also doesn't feel particularly *wrong*. The parsed portions of the request and response are mostly ASCII anyway, and the exceptions generally require wonky code anyway so a little transcoding isn't so bad. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.dumpleton at gmail.com Mon Aug 30 03:16:37 2010 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Mon, 30 Aug 2010 11:16:37 +1000 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <4C78EF4F.3070802@active-4.com> Message-ID: On 30 August 2010 11:02, Ian Bicking wrote: > Ugh... why are we back at bytes again? Because no official decision, by way of a vote or even consensus, has ever been made, the bytes option never goes away. The problem with bytes, before one even tries to compare it to text/unicode option, is that there is no clear description of what is meant by the bytes option. For all I can see, there are potentially multiple interpretations of what is meant by bytes. Although I almost begged that if we are going to discuss bytes, compared to text/unicode, that agreement at least first be made about the definition of the bytes leaning option, that request has pretty well fallen on death ears. Thus the discussion yet again is going the direction of just dithering with a lot of navel gazing and not much else. As I brought up almost two years ago, if we are going to make any progress on this, we are probably going to have a core group of people nominated who can officially make the decision of what is done based on a proper vote. This will be the only way there is going to be any sort of acceptance of a decision. This idea that we can reach a consensus just isn't working. Graham > I don't know of any concrete > problems with using Latin1 (basically how mod_wsgi works).? It would be nice > to try out some tricky cases -- cookie parsing, HTTP proxies, > output-modifying middleware, a few other cases.? But I don't see a reason to > expect they won't work.? It also doesn't feel particularly *wrong*.? The > parsed portions of the request and response are mostly ASCII anyway, and the > exceptions generally require wonky code anyway so a little transcoding isn't > so bad. > > -- > Ian Bicking? |? http://blog.ianbicking.org > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > > From pje at telecommunity.com Mon Aug 30 05:07:49 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 29 Aug 2010 23:07:49 -0400 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <4C78EF4F.3070802@active-4.com> Message-ID: <20100830030802.747023A4100@sparrow.telecommunity.com> At 11:16 AM 8/30/2010 +1000, Graham Dumpleton wrote: >Although I almost begged that if we are going to discuss bytes, >compared to text/unicode, that agreement at least first be made about >the definition of the bytes leaning option, that request has pretty >well fallen on death ears. Did you not see my reply? I (thought I) answered your question, and I actually also suggested that a variation of your unicode proposal might work, too. See: http://mail.python.org/pipermail/web-sig/2010-August/004545.html From graham.dumpleton at gmail.com Mon Aug 30 06:37:14 2010 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Mon, 30 Aug 2010 14:37:14 +1000 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: <20100830030802.747023A4100@sparrow.telecommunity.com> References: <4C76FAC3.5010801@active-4.com> <4C78EF4F.3070802@active-4.com> <20100830030802.747023A4100@sparrow.telecommunity.com> Message-ID: On 30 August 2010 13:07, P.J. Eby wrote: > At 11:16 AM 8/30/2010 +1000, Graham Dumpleton wrote: >> >> Although I almost begged that if we are going to discuss bytes, >> compared to text/unicode, that agreement at least first be made about >> the definition of the bytes leaning option, that request has pretty >> well fallen on death ears. > > Did you not see my reply? ?I (thought I) answered your question, and I > actually also suggested that a variation of your unicode proposal might > work, too. ?See: > > http://mail.python.org/pipermail/web-sig/2010-August/004545.html I was purely asking about bytes, what that means to people who want to push that, and set aside the unicode one for the moment. There have been others as well in the past who have pushed bytes, but they haven't said anything about what it means and I really wanted more input given that in the past the discussions had over the unicode leaning proposals between us core people have been in part derailed by these people who sit mostly on the sidelines and start shouting 'I want bytes instead'. So, I want to give those critics their chance to confirm what they mean by bytes, else we will keep having them pop up time and time again when we are trying to discuss other stuff. So it is the lack of response beyond the usual suspects that am grumpy about. Even in what you mention about bytes you are a bit fuzzy. Having value of wsgi.url_scheme be bytes is reasonable and have no issue with that given that other URL components will be bytes as well, but when you yourself mention keys, you are a bit unsure because of the 'b' plague. So, still no clarity on that point and if people are going to keep raising bytes, would like that better definition of what they are talking about. The only other person who has said anything about bytes is Armin but all that he really said was 'all bytes only'. This isn't much clearer than when people have in the past said 'bytes everywhere', but in some cases didn't actually mean keys. This is why I asked that people cut and paste the definition I gave and change it to exactly what they meant, so not having to second guess. FWIW, from separate discussion understand Armin does mean bytes for keys. So, was really after that clarity so we can say without confusion that our starting point from now is that have two overall proposals and that they be A and B as defined, with possibly even a C and D if need be, not even using the labels bytes and unicode. We can then discuss each in isolation as to whether as defined they would work or not. >From that one or more might die, or might mutate further and actually become closer to the other option but where all are still valid options. Either way, people up till now have it stuck in their heads now this bytes vs unicode divide when strictly speaking it isn't necessarily pure bytes vs pure unicode, but merely a number of different proposals with certain bits in one case using unicode instead of bytes. Given that we have dedicated most time to the unicode leaning solution, would like to go and look properly at the bytes leaning solutions now. That way we have the definitions and also have done the analysis and when people come along later and say 'bytes everywhere', we have something proper to refer back to about it. Anyway, rather than keep arguing the point and move forward, let us perhaps start now with the following definitions and new names to identify them. We can even go a bit stupid and give each its own code name so they are in part more memorable. Any next option based on your suggestions about changing the WHEAT option can be called MAIZE. And if you thinking I am going stark raving mad and should be put in a white jacket and locked up, you could well be right. I am not a happy camper right now, but that is because of many things besides this WSGI stuff. :-) And yes I know about the page that has been just recently put up at: http://www.wsgi.org/wsgi/Python_3 >From memory when I first read it I wasn't sure if that it was completely accurate, but at least it doesn't now mention mod_python instead of mod_wsgi which was mighty confusing. We can perhaps merge the following into that page, ie., expand the table, and talk more about the abstract definitions rather than linking it to specific implementations at this point. We can perhaps then start capturing the pros and cons against each option in the page rather than loosing them in the email chain. OPTION : BARLEY 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are byte strings. 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a byte string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are byte strings. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application must be a byte string. 6. The list of response headers specified by the WSGI application must contain tuples consisting of two values, where each value is a byte string. 7. The iterable returned by the application and from which response content is derived, must yield byte strings. OPTION : RYE 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are native strings. For CGI variables, all names are going to be ISO-8859-1 and so where native strings are unicode strings, that encoding is used for the names of CGI variables. 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a byte string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are byte strings. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application must be a byte string. 6. The list of response headers specified by the WSGI application must contain tuples consisting of two values, where each value is a byte string. 7. The iterable returned by the application and from which response content is derived, must yield byte strings. OPTION : WHEAT 1. The application is passed an instance of a Python dictionary containing what is referred to as the WSGI environment. All keys in this dictionary are native strings. For CGI variables, all names are going to be ISO-8859-1 and so where native strings are unicode strings, that encoding is used for the names of CGI variables 2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI environment, the value of the variable should be a native string. 3. For the CGI variables contained in the WSGI environment, the values of the variables are native strings. Where native strings are unicode strings, ISO-8859-1 encoding would be used such that the original character data is preserved and as necessary the unicode string can be converted back to bytes and thence decoded to unicode again using a different encoding. 4. The WSGI input stream 'wsgi.input' contained in the WSGI environment and from which request content is read, should yield byte strings. 5. The status line specified by the WSGI application should be a byte string. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. 6. The list of response headers specified by the WSGI application should contain tuples consisting of two values, where each value is a byte string. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. 7. The iterable returned by the application and from which response content is derived, should yield byte strings. Where native strings are unicode strings, the native string type can also be returned in which case it would be encoded as ISO-8859-1. Graham From ianb at colorstudy.com Mon Aug 30 18:00:28 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 30 Aug 2010 11:00:28 -0500 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <4C78EF4F.3070802@active-4.com> <20100830030802.747023A4100@sparrow.telecommunity.com> Message-ID: Just to narrow in on one case, URLs, there are a few pieces of information that make up the URL: wsgi.url_scheme: this is *not* present in the request, it's inferred somehow (e.g., by the port the client connected to) HTTP_HOST: this is a header. It typically contains both the hostname and the port. The encoding is generally idna, though you have to split the port off first. The unicode version of the hostname is not widely supported in client libraries (it's usually applied at the UI level). SCRIPT_NAME/PATH_INFO: these represent a portion of the request path (before ?). As submitted these are generally ASCII (URL-quoted). After unquoting, they are typically UTF-8, but may be of any or no encoding. If an unsafe character is present in the URL-quoted version of the path, it may be quoted at the byte level. The '?' character is effectively a byte-oriented marker and encodings cannot affect it. QUERY_STRING: this is also generally ASCII (URL-quoted). Unsafe characters could be quoted at the byte level. Generally I'm unaware of any reasonable situation where quoting unsafe characters in an HTTP request would be improper, or even lose any meaningful information. Mostly because I don't know of any clients that actually would expect unsafe characters to work. Quoting HTTP_HOST is difficult, as it's not a byte-oriented quoting, it's a fairly complex encoding. But I'm also not sure where in a stack you could actually handle unsafe characters in HTTP_HOST -- it seems like simply an invalid request, and deferring the error won't give another part of the stack the opportunity to do the right thing. In their quoted form all these values (at least including the quoted path, not the unquoted SCRIPT_NAME/PATH_INFO) *should* be ASCII, and I believe a WSGI server could ensure they were all ASCII without any loss of useful information (either by simply rejecting the request or by applying quoting). I don't see any place where bytes are advantageous. Representing invalid requests does not seem particularly helpful -- *some* invalid requests are useful to handle (e.g., weird cookies) but in the case of the URL variables I don't see any benefit. IMHO all the tricky encoding issues are in the request and response bodies, and I'm pretty sure we have consensus that those should be bytes. Reiterating other encoding issues I'm aware of: Cookie encodings, but parsing cookies as bytes or Latin1 is basically equivalent, and I don't believe that, for instance, they should ever be parsed as UTF-8. Parsing as bytes might avoid an unnecessary encoding/decoding, but it's all tricky enough that libraries should do it anyway, and the encoding overhead alone isn't very important. Another example is the Atom Title header ( http://bitworking.org/projects/atom/draft-ietf-atompub-protocol-08.html#rfc.section.8.1.2) but that's supposed to be Latin1 with RFC2047 encodings, and I don't believe anyone is proposing that RFC2047 encodings be handled generally at the WSGI layer (I think CherryPy does or used to handle these, but there were many objections at least on this list about it, in part due to security concerns). A 2047 encoding is like "Title: =?utf-8?q?stuff-with=-escaping?=". Response headers are equivalent to request headers. Response status is constrained by the spec to Latin1, and there are no use cases I know of (even really obscure ones) where it would be necessary to use other encodings. And that's it! HTTP has a fairly finite amount of surface area. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Mon Aug 30 18:26:48 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 30 Aug 2010 12:26:48 -0400 Subject: [Web-SIG] WSGI for Python 3 In-Reply-To: References: <4C76FAC3.5010801@active-4.com> <4C78EF4F.3070802@active-4.com> <20100830030802.747023A4100@sparrow.telecommunity.com> Message-ID: <20100830162702.650A23A40A5@sparrow.telecommunity.com> At 02:37 PM 8/30/2010 +1000, Graham Dumpleton wrote: >Anyway, rather than keep arguing the point and move forward, let us >perhaps start now with the following definitions and new names to >identify them. We can even go a bit stupid and give each its own code >name so they are in part more memorable. Any next option based on your >suggestions about changing the WHEAT option can be called MAIZE. And >if you thinking I am going stark raving mad and should be put in a >white jacket and locked up, you could well be right. I am not a happy >camper right now, but that is because of many things besides this WSGI >stuff. :-) > > And yes I know about the page that has been just recently put up at: > > http://www.wsgi.org/wsgi/Python_3 > > From memory when I first read it I wasn't sure if that it was >completely accurate, but at least it doesn't now mention mod_python >instead of mod_wsgi which was mighty confusing. We can perhaps merge >the following into that page, ie., expand the table, and talk more >about the abstract definitions rather than linking it to specific >implementations at this point. We can perhaps then start capturing the >pros and cons against each option in the page rather than loosing them >in the email chain. I've added a column to the page called "flat" that captures my current proposal (native keys, surrogateescape values, byte stream in, strict bytes-only for all outputs). This seems to me an optimum balance between: * Verifiability (especially *composable* verifiability) * Low cognitive overhead (i.e., fewest things to remember) * Low amount of finger-typing and fewer conversions But I certainly could be convinced otherwise by example or argument. (One other thing I consider a plus for this approach, btw: os.environ is still largely usable as a WSGI environ in the CGI case. This isn't so much a valuable thing in itself, as that it's an indicator of low complexity and cognitive overhead.)