From graham.dumpleton at gmail.com  Wed Apr  1 12:29:18 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Wed, 1 Apr 2009 21:29:18 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
Message-ID: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>

Based on any discussions at PyCon, can someone give a summary of any
conclusions drawn about how WSGI 1.0 should be implemented in Python
3.0.

The previous analysis of this is at:

  http://www.wsgi.org/wsgi/Amendments_1.0

I realise it may be work in progress, but I note that work being done
on WSGI server associated with CherryPy for Python 3.0 by Robert isn't
necessarily following that and is perhaps starting to do things in a
way that I understood were only being speculated upon for WSGI 2.0,
not for WSGI 1.0. For example:

  http://www.cherrypy.org/changeset/2199

In particular, it has:

  environ["SCRIPT_NAME"] = b""

The bit from prior analysis which is relevant is:

"""When running under Python 3, servers MUST provide CGI HTTP
variables as strings, decoded from the headers using HTTP standard
encodings (i.e. latin-1 + RFC 2047) (Open question: are there any CGI
or WSGI variables that should NOT be strings?)"""

Since mod_wsgi has used the prior analysis as basis of Python 3.0
support, would want to know pretty soon what direction WSGI 1.0 under
Python 3.0 is going to take, else I am going to have to delay
releasing mod_wsgi 3.0 or simply yank the support for Python 3.0.

Robert, yes I know I could have asked you direct, but want a consensus
from all who were present at PyCon and discussed these things.

Graham

From fumanchu at aminus.org  Wed Apr  1 14:18:37 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 1 Apr 2009 05:18:37 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>

Graham Dumpleton wrote:
> Based on any discussions at PyCon, can someone give a summary of any
> conclusions drawn about how WSGI 1.0 should be implemented in Python
> 3.0.
> 
> The previous analysis of this is at:
> 
>   http://www.wsgi.org/wsgi/Amendments_1.0
> 
> I realise it may be work in progress, but I note that work being done
> on WSGI server associated with CherryPy for Python 3.0 by Robert isn't
> necessarily following that and is perhaps starting to do things in a
> way that I understood were only being speculated upon for WSGI 2.0,
> not for WSGI 1.0. For example:
> 
>   http://www.cherrypy.org/changeset/2199
> 
> In particular, it has:
> 
>   environ["SCRIPT_NAME"] = b""
> 
> The bit from prior analysis which is relevant is:
> 
> """When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047) (Open question: are there any CGI
> or WSGI variables that should NOT be strings?)"""
> 
> Since mod_wsgi has used the prior analysis as basis of Python 3.0
> support, would want to know pretty soon what direction WSGI 1.0 under
> Python 3.0 is going to take, else I am going to have to delay
> releasing mod_wsgi 3.0 or simply yank the support for Python 3.0.
> 
> Robert, yes I know I could have asked you direct, but want a consensus
> from all who were present at PyCon and discussed these things.

Good timing. We had been thinking to make everything strings except for
SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
from the Request-URI, which may be in any encoding. It was thought that
the app would be best-qualified to decode those three.

I hope to discuss that further this morning at the sprints. Turns out
the cgi module in Python 3 only does text, not bytes. I considered
submitting a patch to make it handle bytes for fp/environ but that
became difficult quickly and may complicate the cgi module needlessly if
we can instead use unicode for those 3 environ entries. I'll report back
here.


Robert Brewer
fumanchu at aminus.org


From guido at python.org  Wed Apr  1 18:34:24 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 09:34:24 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com> 
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
Message-ID: <ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>

On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org> wrote:
> Good timing. We had been thinking to make everything strings except for
> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
> from the Request-URI, which may be in any encoding. It was thought that
> the app would be best-qualified to decode those three.

Argh. The *meaning* of these fields is clearly text. It would be most
unfortunately if all apps were required to deal with decoding bytes
for these (there is no choice any more, unlike in 2.x). I appreciate
the sentiment that the encoding is unknown, but I would much prefer it
if there was a default encoding that the app could override, or if
there was some other mechanism whereby the app would not have to be
bothered with decoding bytes unless it cared.

Note that Py3k also treats filenames as text, with an optional escape
hatch for using bytes that only very few apps will need to use.

> I hope to discuss that further this morning at the sprints. Turns out
> the cgi module in Python 3 only does text, not bytes. I considered
> submitting a patch to make it handle bytes for fp/environ but that
> became difficult quickly and may complicate the cgi module needlessly if
> we can instead use unicode for those 3 environ entries. I'll report back
> here.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fumanchu at aminus.org  Wed Apr  1 18:37:38 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 1 Apr 2009 09:37:38 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407C582A7@ex10.hostedexchange.local>

Guido van Rossum wrote:
> Sent: Wednesday, April 01, 2009 9:34 AM
> To: Robert Brewer
> Cc: Web SIG
> Subject: Re: [Web-SIG] Python 3.0 and WSGI 1.0.
> 
> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org>
> wrote:
> > Good timing. We had been thinking to make everything strings except
> for
> > SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
> > from the Request-URI, which may be in any encoding. It was thought
> that
> > the app would be best-qualified to decode those three.
> 
> Argh. The *meaning* of these fields is clearly text. It would be most
> unfortunately if all apps were required to deal with decoding bytes
> for these (there is no choice any more, unlike in 2.x). I appreciate
> the sentiment that the encoding is unknown, but I would much prefer it
> if there was a default encoding that the app could override, or if
> there was some other mechanism whereby the app would not have to be
> bothered with decoding bytes unless it cared.
> 
> Note that Py3k also treats filenames as text, with an optional escape
> hatch for using bytes that only very few apps will need to use.

Understood. I think we have plenty of options here for returning text.
We'll discuss this ASAP in the room.


Robert Brewer
fumanchu at aminus.org


From janssen at parc.com  Wed Apr  1 19:59:56 2009
From: janssen at parc.com (Bill Janssen)
Date: Wed, 1 Apr 2009 10:59:56 PDT
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
Message-ID: <86217.1238608796@parc.com>

Guido van Rossum <guido at python.org> wrote:

> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org> wrote:
> > Good timing. We had been thinking to make everything strings except for
> > SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
> > from the Request-URI, which may be in any encoding. It was thought that
> > the app would be best-qualified to decode those three.
> 
> Argh. The *meaning* of these fields is clearly text.

I wouldn't read too much into those names -- they were chosen when the
CGI spec was just gestating, long before the usage patterns solidified,
and don't necessarily reflect the usage of the data bound to them.  I
believe this work was done before the formal IETF definition of a URL,
for instance.

I think the controlling reference here is RFC 3875.

It's not at all clear to me what the SCRIPT_NAME is.  Is it a pathname,
involving the local file system's filenames, which recent discussions
seem to indicate may or may not correspond to human-notional strings, or
a URI path?  I'm OK with calling it text, with a proviso that there may
be cases where it's not.

I've never actually seen a CGI call with PATH_INFO set; I think it's
obsolete usage (but pretty clearly a string).  RFC 3875 says, "Similarly,
treatment of non US-ASCII characters in the path is system-defined."

QUERY_STRING -- should always be an ASCII string.  May indeed encode
non-Unicode strings or purely binary data, but when passed to the CGI
script, it's still encoded as it was in the URI.

Bill

From ianb at colorstudy.com  Wed Apr  1 21:15:17 2009
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed, 1 Apr 2009 14:15:17 -0500
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com> 
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local> 
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
Message-ID: <b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>

On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <guido at python.org> wrote:
> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org> wrote:
>> Good timing. We had been thinking to make everything strings except for
>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
>> from the Request-URI, which may be in any encoding. It was thought that
>> the app would be best-qualified to decode those three.
>
> Argh. The *meaning* of these fields is clearly text. It would be most
> unfortunately if all apps were required to deal with decoding bytes
> for these (there is no choice any more, unlike in 2.x). I appreciate
> the sentiment that the encoding is unknown, but I would much prefer it
> if there was a default encoding that the app could override, or if
> there was some other mechanism whereby the app would not have to be
> bothered with decoding bytes unless it cared.

This might be fine, except it is hard.  You can't just take arbitrary
bytes and do script_name.decode('utf8'), and then when you realize you
had it wrong do script_name.encode('utf8').decode('latin1').


-- 
Ian Bicking  |  http://blog.ianbicking.org

From guido at python.org  Wed Apr  1 22:09:17 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 13:09:17 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com> 
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local> 
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com> 
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
Message-ID: <ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>

On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking <ianb at colorstudy.com> wrote:
> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <guido at python.org> wrote:
>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org> wrote:
>>> Good timing. We had been thinking to make everything strings except for
>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
>>> from the Request-URI, which may be in any encoding. It was thought that
>>> the app would be best-qualified to decode those three.
>>
>> Argh. The *meaning* of these fields is clearly text. It would be most
>> unfortunately if all apps were required to deal with decoding bytes
>> for these (there is no choice any more, unlike in 2.x). I appreciate
>> the sentiment that the encoding is unknown, but I would much prefer it
>> if there was a default encoding that the app could override, or if
>> there was some other mechanism whereby the app would not have to be
>> bothered with decoding bytes unless it cared.
>
> This might be fine, except it is hard. ?You can't just take arbitrary
> bytes and do script_name.decode('utf8'), and then when you realize you
> had it wrong do script_name.encode('utf8').decode('latin1').

Well you could make the bytes versions available under different keys.
I think you do something a bit similar this in webob, e.g. req.params
vs. req.str_params. (Perhaps you could have QUERY_STRING and
QUERY_BYTES.) The decode() call used to create the text strings could
use 'replace' as the error handler and the app could check for the
presence of the replacement character ('\ufffd') in the string to see
if there was a problem; or it could just work with the string
containing that character and report the user some kind of 40x or 50x
error. Frameworks (like webob) would of course do the right thing and
look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
be optional.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From graham.dumpleton at gmail.com  Wed Apr  1 22:51:35 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 2 Apr 2009 07:51:35 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
Message-ID: <88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>

2009/4/2 Guido van Rossum <guido at python.org>:
> On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking <ianb at colorstudy.com> wrote:
>> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <guido at python.org> wrote:
>>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org> wrote:
>>>> Good timing. We had been thinking to make everything strings except for
>>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are pulled
>>>> from the Request-URI, which may be in any encoding. It was thought that
>>>> the app would be best-qualified to decode those three.
>>>
>>> Argh. The *meaning* of these fields is clearly text. It would be most
>>> unfortunately if all apps were required to deal with decoding bytes
>>> for these (there is no choice any more, unlike in 2.x). I appreciate
>>> the sentiment that the encoding is unknown, but I would much prefer it
>>> if there was a default encoding that the app could override, or if
>>> there was some other mechanism whereby the app would not have to be
>>> bothered with decoding bytes unless it cared.
>>
>> This might be fine, except it is hard. ?You can't just take arbitrary
>> bytes and do script_name.decode('utf8'), and then when you realize you
>> had it wrong do script_name.encode('utf8').decode('latin1').
>
> Well you could make the bytes versions available under different keys.
> I think you do something a bit similar this in webob, e.g. req.params
> vs. req.str_params. (Perhaps you could have QUERY_STRING and
> QUERY_BYTES.) The decode() call used to create the text strings could
> use 'replace' as the error handler and the app could check for the
> presence of the replacement character ('\ufffd') in the string to see
> if there was a problem; or it could just work with the string
> containing that character and report the user some kind of 40x or 50x
> error. Frameworks (like webob) would of course do the right thing and
> look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
> be optional.

Can we please not invent new names at global context in WSGI
environment dictionary, especially ones that mutate existing names
rather than using a prefix or suffix.

If we are going to carry values in two different formats, then use the
'wsgi' name space. Thus, for byte versions of values perhaps use:

  wsgi.request_uri
  wsgi.script_name
  wsgi.path_info
  wsgi.query_string
  etc

In other words, leave all the existing CGI variables to come through
as latin-1 decode and do anything new in 'wsgi' variable namespace,
identifying only the minimal set which needs to be made available as
bytes.

Graham

From fumanchu at aminus.org  Thu Apr  2 00:30:02 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 1 Apr 2009 15:30:02 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com><F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local><ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com><b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com><ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407C58781@ex10.hostedexchange.local>

Graham Dumpleton wrote:
> 2009/4/2 Guido van Rossum <guido at python.org>:
> > On Wed, Apr 1, 2009 at 12:15 PM, Ian Bicking <ianb at colorstudy.com>
> wrote:
> >> On Wed, Apr 1, 2009 at 11:34 AM, Guido van Rossum <guido at python.org>
> wrote:
> >>> On Wed, Apr 1, 2009 at 5:18 AM, Robert Brewer <fumanchu at aminus.org>
> wrote:
> >>>> Good timing. We had been thinking to make everything strings
> except for
> >>>> SCRIPT_NAME, PATH_INFO, and QUERY_STRING, since these few are
> pulled
> >>>> from the Request-URI, which may be in any encoding. It was thought
> that
> >>>> the app would be best-qualified to decode those three.
> >>>
> >>> Argh. The *meaning* of these fields is clearly text. It would be
> most
> >>> unfortunately if all apps were required to deal with decoding bytes
> >>> for these (there is no choice any more, unlike in 2.x). I
> appreciate
> >>> the sentiment that the encoding is unknown, but I would much prefer
> it
> >>> if there was a default encoding that the app could override, or if
> >>> there was some other mechanism whereby the app would not have to be
> >>> bothered with decoding bytes unless it cared.
> >>
> >> This might be fine, except it is hard. ?You can't just take
> arbitrary
> >> bytes and do script_name.decode('utf8'), and then when you realize
> you
> >> had it wrong do script_name.encode('utf8').decode('latin1').
> >
> > Well you could make the bytes versions available under different
> keys.
> > I think you do something a bit similar this in webob, e.g. req.params
> > vs. req.str_params. (Perhaps you could have QUERY_STRING and
> > QUERY_BYTES.) The decode() call used to create the text strings could
> > use 'replace' as the error handler and the app could check for the
> > presence of the replacement character ('\ufffd') in the string to see
> > if there was a problem; or it could just work with the string
> > containing that character and report the user some kind of 40x or 50x
> > error. Frameworks (like webob) would of course do the right thing and
> > look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
> > be optional.
> 
> Can we please not invent new names at global context in WSGI
> environment dictionary, especially ones that mutate existing names
> rather than using a prefix or suffix.
> 
> If we are going to carry values in two different formats, then use the
> 'wsgi' name space. Thus, for byte versions of values perhaps use:
> 
>   wsgi.request_uri
>   wsgi.script_name
>   wsgi.path_info
>   wsgi.query_string
>   etc
> 
> In other words, leave all the existing CGI variables to come through
> as latin-1 decode and do anything new in 'wsgi' variable namespace,
> identifying only the minimal set which needs to be made available as
> bytes.

Some thoughts:

 1. If we always decode as Latin-1 it should be lossless, and consumers could retrieve the original bytes with val.decode('Latin-1'), thus removing the need for separate entries.

 2. CGI says, "REMOTE_USER = *OCTET" :(

 3. Bikeshed: "wsgi.xyz" is too close to "XYZ" in my opinion.


Robert Brewer
fumanchu at aminus.org


From pje at telecommunity.com  Thu Apr  2 00:37:49 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 01 Apr 2009 18:37:49 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.co
 m>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
Message-ID: <20090401223524.F138A3A40A7@sparrow.telecommunity.com>

At 01:09 PM 4/1/2009 -0700, Guido van Rossum wrote:
>Well you could make the bytes versions available under different keys.
>I think you do something a bit similar this in webob, e.g. req.params
>vs. req.str_params. (Perhaps you could have QUERY_STRING and
>QUERY_BYTES.) The decode() call used to create the text strings could
>use 'replace' as the error handler and the app could check for the
>presence of the replacement character ('\ufffd') in the string to see
>if there was a problem; or it could just work with the string
>containing that character and report the user some kind of 40x or 50x
>error. Frameworks (like webob) would of course do the right thing and
>look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
>be optional.

The big problem I see with this approach is that any middleware that 
operates on these environment keys would have to be changed.

I think perhaps the problem here is the assumption that the environ 
dictionary has to be a straight-up copy of os.environ, when it can be 
whatever we want it to be.  If wsgiref or other CGI->WSGI gateways 
have to change to get the environ set up correctly, this is less of a 
problem than forcing everybody to rewrite their middleware and apps.


From guido at python.org  Thu Apr  2 00:51:34 2009
From: guido at python.org (Guido van Rossum)
Date: Wed, 1 Apr 2009 15:51:34 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <20090401223524.F138A3A40A7@sparrow.telecommunity.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com> 
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local> 
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com> 
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com> 
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com> 
	<20090401223524.F138A3A40A7@sparrow.telecommunity.com>
Message-ID: <ca471dc20904011551l222ede2ey878ba932cac1ab0d@mail.gmail.com>

On Wed, Apr 1, 2009 at 3:37 PM, P.J. Eby <pje at telecommunity.com> wrote:
> At 01:09 PM 4/1/2009 -0700, Guido van Rossum wrote:
>>
>> Well you could make the bytes versions available under different keys.
>> I think you do something a bit similar this in webob, e.g. req.params
>> vs. req.str_params. (Perhaps you could have QUERY_STRING and
>> QUERY_BYTES.) The decode() call used to create the text strings could
>> use 'replace' as the error handler and the app could check for the
>> presence of the replacement character ('\ufffd') in the string to see
>> if there was a problem; or it could just work with the string
>> containing that character and report the user some kind of 40x or 50x
>> error. Frameworks (like webob) would of course do the right thing and
>> look for QUERY_BYTES before QUERY_STRING. QUERY_BYTES should probably
>> be optional.
>
> The big problem I see with this approach is that any middleware that
> operates on these environment keys would have to be changed.
>
> I think perhaps the problem here is the assumption that the environ
> dictionary has to be a straight-up copy of os.environ, when it can be
> whatever we want it to be. ?If wsgiref or other CGI->WSGI gateways have to
> change to get the environ set up correctly, this is less of a problem than
> forcing everybody to rewrite their middleware and apps.

Well I would assume that changing the type of these variables to bytes
would *also* cause problems for a lot of middleware.

The proposal that the bytes values should be in the 'wsgi.*' namespace
would work for me too.

Note that os.environ already has some not-entirely-solved problems
with encodings, which we currently try to pretend don't exist...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From alan at xhaus.com  Thu Apr  2 00:53:16 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Wed, 1 Apr 2009 17:53:16 -0500
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
Message-ID: <4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>

Hi Graham,

I think yours is a good solution to the problem.

[Graham]
> In other words, leave all the existing CGI variables to come through
> as latin-1 decode

As latin-1 or rfc-2047 decoded, to unicode.

> and do anything new in 'wsgi' variable namespace,

So the server provides

"wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
"wsgi.server_decoded_PATH_INFO" == u"whatever"
"wsgi.server_decode_charset" == u"utf-8"

Just my ?0,02.

Alan.

From graham.dumpleton at gmail.com  Thu Apr  2 01:00:10 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 2 Apr 2009 10:00:10 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
	<4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
Message-ID: <88e286470904011600j340bbec6m5d307ea552bb20ef@mail.gmail.com>

2009/4/2 Alan Kennedy <alan at xhaus.com>:
> Hi Graham,
>
> I think yours is a good solution to the problem.
>
> [Graham]
>> In other words, leave all the existing CGI variables to come through
>> as latin-1 decode
>
> As latin-1 or rfc-2047 decoded, to unicode.

Has anyone actually got an example of code for doing RFC-2047
decoding. Are there even any systems which make use of that encoding
for web requests anyway. I still haven't really addressed that
decoding requirement and I haven't seen any existing Python web stuff
that tries to.

>> and do anything new in 'wsgi' variable namespace,
>
> So the server provides
>
> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
> "wsgi.server_decoded_PATH_INFO" == u"whatever"
> "wsgi.server_decode_charset" == u"utf-8"

Hmmm, I thought we were talking about the 'wsgi.' variants being
bytes. Ie., only talking here about Python 3.0. The existing
SCRIPT_NAME etc, would be string (unicode), but as latin-1.

Graham

From fumanchu at aminus.org  Thu Apr  2 01:05:26 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 1 Apr 2009 16:05:26 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com><F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local><ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com><b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com><ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com><88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
	<4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407C587D5@ex10.hostedexchange.local>

Alan Kennedy wrote:
> Hi Graham,
> 
> I think yours is a good solution to the problem.
> 
> [Graham]
> > In other words, leave all the existing CGI variables to come through
> > as latin-1 decode
> 
> As latin-1 or rfc-2047 decoded, to unicode.
> 
> > and do anything new in 'wsgi' variable namespace,
> 
> So the server provides
> 
> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
> "wsgi.server_decoded_PATH_INFO" == u"whatever"
> "wsgi.server_decode_charset" == u"utf-8"

I think everyone at the sprint today acquiesced to having
SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode. The
server can decide (probably subject to configuration). I've implemented
this in the python3 branch of CherryPy and it seems to work brilliantly.
Assuming the server *is* configurable, deployers should be able to
choose Latin-1 if they need to recover the original bytes, without
having to support a separate set of encoded-byte entries.

Side note: wrapping the wsgi.input fp in a DecodingWrapper before
handing it to cgi works great, too. No need to rewrite the cgi module to
support bytes as I feared.


Robert Brewer
fumanchu at aminus.org


From fumanchu at aminus.org  Thu Apr  2 01:07:14 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 1 Apr 2009 16:07:14 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904011600j340bbec6m5d307ea552bb20ef@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com><F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local><ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com><b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com><ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com><88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com><4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
	<88e286470904011600j340bbec6m5d307ea552bb20ef@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407C587D9@ex10.hostedexchange.local>

Graham Dumpleton wrote:
> Has anyone actually got an example of code for doing RFC-2047
> decoding. Are there even any systems which make use of that encoding
> for web requests anyway. I still haven't really addressed that
> decoding requirement and I haven't seen any existing Python web stuff
> that tries to.

http://www.cherrypy.org/browser/trunk/cherrypy/lib/http.py#L196

Currently, CP apps call that. We can move that to the server if desired.


Robert Brewer
fumanchu at aminus.org


From graham.dumpleton at gmail.com  Thu Apr  2 01:11:30 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 2 Apr 2009 10:11:30 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6407C587D5@ex10.hostedexchange.local>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
	<4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C587D5@ex10.hostedexchange.local>
Message-ID: <88e286470904011611r1cd264ej5d7fa5c3a7377a4c@mail.gmail.com>

2009/4/2 Robert Brewer <fumanchu at aminus.org>:
> Alan Kennedy wrote:
>> Hi Graham,
>>
>> I think yours is a good solution to the problem.
>>
>> [Graham]
>> > In other words, leave all the existing CGI variables to come through
>> > as latin-1 decode
>>
>> As latin-1 or rfc-2047 decoded, to unicode.
>>
>> > and do anything new in 'wsgi' variable namespace,
>>
>> So the server provides
>>
>> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
>> "wsgi.server_decoded_PATH_INFO" == u"whatever"
>> "wsgi.server_decode_charset" == u"utf-8"
>
> I think everyone at the sprint today acquiesced to having
> SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode. The
> server can decide (probably subject to configuration). I've implemented
> this in the python3 branch of CherryPy and it seems to work brilliantly.
> Assuming the server *is* configurable, deployers should be able to
> choose Latin-1 if they need to recover the original bytes, without
> having to support a separate set of encoded-byte entries.

Seems to me that you can't have it be configurable and it must always
be latin-1 interpretation. The problem is where you are composing
multiple WSGI applications. If they each have different expectations
or requirements as to how it is handled, aren't you going to have a
problem. Or am I missing something in the way you are explaining it?

Graham

From alan at xhaus.com  Thu Apr  2 01:15:34 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Wed, 1 Apr 2009 18:15:34 -0500
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <86217.1238608796@parc.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
Message-ID: <4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>

Hi Bill,

[Bill]
> I think the controlling reference here is RFC 3875.

I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.

RFC 2616, the HTTP 1.1 spec, punts on the question of character
encoding for the request URI.

RFC 2396, the URI spec, says

"""
   It is expected that a systematic treatment of character encoding
   within URI will be developed as a future modification of this
   specification.
"""

RFC 3987 is that spec, for Internationalized Resource Identifiers. It says

"""
An IRI is a sequence of characters from the Universal Character Set
(Unicode/ISO 10646).
"""

and

"""
1.2.  Applicability

   IRIs are designed to be compatible with recommendations for new URI
   schemes [RFC2718].  The compatibility is provided by specifying a
   well-defined and deterministic mapping from the IRI character
   sequence to the functionally equivalent URI character sequence.
   Practical use of IRIs (or IRI references) in place of URIs (or URI
   references) depends on the following conditions being met:
"""

followed by

"""
   c.  The URI corresponding to the IRI in question has to encode
       original characters into octets using UTF-8.  For new URI
       schemes, this is recommended in [RFC2718].  It can apply to a
       whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384],
       or the URN syntax [RFC2141]).  It can apply to a specific part of
       a URI, such as the fragment identifier (e.g., [XPointer]).  It
       can apply to a specific URI or part(s) thereof.  For details,
       please see section 6.4.
"""

I think the question is "are people using IRIs in the wild"? If so,
then we must decide how do we best deal with the problems of
recognising iso-8859-1+rfc2037 versus utf-8, or whatever
server-configured encoding the user has chosen.

Alan.

From fumanchu at aminus.org  Thu Apr  2 01:22:03 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 1 Apr 2009 16:22:03 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904011611r1cd264ej5d7fa5c3a7377a4c@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
	<4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C587D5@ex10.hostedexchange.local>
	<88e286470904011611r1cd264ej5d7fa5c3a7377a4c@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407C587FD@ex10.hostedexchange.local>

Graham Dumpleton wrote:
> 2009/4/2 Robert Brewer <fumanchu at aminus.org>:
> > Alan Kennedy wrote:
> >> Hi Graham,
> >>
> >> I think yours is a good solution to the problem.
> >>
> >> [Graham]
> >> > In other words, leave all the existing CGI variables to come
> through
> >> > as latin-1 decode
> >>
> >> As latin-1 or rfc-2047 decoded, to unicode.
> >>
> >> > and do anything new in 'wsgi' variable namespace,
> >>
> >> So the server provides
> >>
> >> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
> >> "wsgi.server_decoded_PATH_INFO" == u"whatever"
> >> "wsgi.server_decode_charset" == u"utf-8"
> >
> > I think everyone at the sprint today acquiesced to having
> > SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode.
> The
> > server can decide (probably subject to configuration). I've
> implemented
> > this in the python3 branch of CherryPy and it seems to work
> brilliantly.
> > Assuming the server *is* configurable, deployers should be able to
> > choose Latin-1 if they need to recover the original bytes, without
> > having to support a separate set of encoded-byte entries.
> 
> Seems to me that you can't have it be configurable and it must always
> be latin-1 interpretation. The problem is where you are composing
> multiple WSGI applications. If they each have different expectations
> or requirements as to how it is handled, aren't you going to have a
> problem. Or am I missing something in the way you are explaining it?

I would not expect multiple middlewares to want to decode the same URI
differently. But I would assume you'd run into problems when multiple
URI's in the same site had different encodings. Mark Ramm gave the use
case of exposing Unix filenames-as-bytes in URL's--the encoding is
unknown but a human may know better.

Allowing/forcing the human to stick that information in the app or in
the server is the same work, IMO. A server could be configurable to the
point of using different encodings for different URI's via regex
matching or <Location> sections or some other means. I'd be happy with a
spec that said, "servers MUST always decode these 3 entries, but SHOULD
allow the encoding used to be configurable." I'd be equally happy with a
spec that said, "servers MUST always decode these 3 as Latin-1" and
explain why. Both have their manageable pros and cons. But delaying the
decoding to the app by setting those 3 entries as bytes has more cons
than pros.


Robert Brewer
fumanchu at aminus.org


From graham.dumpleton at gmail.com  Thu Apr  2 01:42:23 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 2 Apr 2009 10:42:23 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6407C587FD@ex10.hostedexchange.local>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
	<4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C587D5@ex10.hostedexchange.local>
	<88e286470904011611r1cd264ej5d7fa5c3a7377a4c@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C587FD@ex10.hostedexchange.local>
Message-ID: <88e286470904011642v138385a8tfe9889197fe69a3b@mail.gmail.com>

2009/4/2 Robert Brewer <fumanchu at aminus.org>:
> Graham Dumpleton wrote:
>> 2009/4/2 Robert Brewer <fumanchu at aminus.org>:
>> > Alan Kennedy wrote:
>> >> Hi Graham,
>> >>
>> >> I think yours is a good solution to the problem.
>> >>
>> >> [Graham]
>> >> > In other words, leave all the existing CGI variables to come
>> through
>> >> > as latin-1 decode
>> >>
>> >> As latin-1 or rfc-2047 decoded, to unicode.
>> >>
>> >> > and do anything new in 'wsgi' variable namespace,
>> >>
>> >> So the server provides
>> >>
>> >> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
>> >> "wsgi.server_decoded_PATH_INFO" == u"whatever"
>> >> "wsgi.server_decode_charset" == u"utf-8"
>> >
>> > I think everyone at the sprint today acquiesced to having
>> > SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode.
>> The
>> > server can decide (probably subject to configuration). I've
>> implemented
>> > this in the python3 branch of CherryPy and it seems to work
>> brilliantly.
>> > Assuming the server *is* configurable, deployers should be able to
>> > choose Latin-1 if they need to recover the original bytes, without
>> > having to support a separate set of encoded-byte entries.
>>
>> Seems to me that you can't have it be configurable and it must always
>> be latin-1 interpretation. The problem is where you are composing
>> multiple WSGI applications. If they each have different expectations
>> or requirements as to how it is handled, aren't you going to have a
>> problem. Or am I missing something in the way you are explaining it?
>
> I would not expect multiple middlewares to want to decode the same URI
> differently.

I was not thinking about multiple middlewares, but multiple distinct
WSGI applications (end consumer, not middleware) composited together
by something like Paste cascade, Pylons configuration or even
something like a routes based dispatcher.

In the case of something like cascade they aren't necessarily on
different URLs. For the later they would be, even so, just making sure
that having different URLs with different encodings isn't going to be
an issue in respect of mapping middleware. So long as code/config
files are always UTF-8 encoded and capable of representing any
possible decodings of URL, then probably okay.

Graham

From alan at xhaus.com  Thu Apr  2 01:43:32 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Wed, 1 Apr 2009 18:43:32 -0500
Subject: [Web-SIG] WSGI Open Space @ PyCon.
In-Reply-To: <e91cc0270903282214x333c2624lfd9fbdbd0ae68313@mail.gmail.com>
References: <4a951aa00903271330w48055728i582263fcf67687e5@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407B53581@ex10.hostedexchange.local>
	<F1962646D3B64642B7C9A06068EE1E6407B5361E@ex10.hostedexchange.local>
	<e91cc0270903282214x333c2624lfd9fbdbd0ae68313@mail.gmail.com>
Message-ID: <4a951aa00904011643u54e667dy745f94b8a26191dc@mail.gmail.com>

[Noah]
> +1 on the iterator, although I might just like the idea and might be missing
> something important. ?It seems like there are a lot of powerful things being
> developed with generators in mind, and there are some nifty things you can
> do with them like the contextlib example:
> ?http://docs.python.org/library/contextlib.html#contextlib.closing

Indeed, like coroutines.

http://www.python.org/dev/peps/pep-0342/

[Robert]
>> The counter-argument was that
>> servers could use non-blocking sockets to allow apps which read() to
>> yield in the case of no immediate data rather than block indefinitely.

Ah, but the problem with that is that one can't magically suspend
methods like that and return control to the scheduler, without using
coroutines or stackless.

Who does the read() method return control to when there's no data
available (i.e. no bytes on the socket). If wsgi.input is a simple
file-like object, then it's methods must be coded to recognise, rather
than blocking, when the data is not yet available to fulfill the
applications expectation. How does it know how to return control to
the scheduler, instead of the application?

If the application expects to receive all of the data that it asked
for with a, say read(1024) call, it has to be prepared to accept that
it may get less than 1024 bytes, in an asynchronous situation. What
does it return to the application in the case when < 1024 bytes is
available?

>> If a file-like object were retained, it would help to publish a
>> chainable file example to help middleware re-stream files they read any
>> part of.

I don't think that re-streaming of input should be a part of the spec;
it's an application layer thing. We don't expect to re-stream the
output of an application: why re-stream the input?

If some application needs to examine the entire byte sequence for
whatever reasons, that's a special case that can be catered for with
itertools, and dedicated middleware.

>> Continuing deferred issues

>> ?* Lifecycle methods (start/stop/etc event API driven by the container)

I'd really like to get this one nailed: java people and .net people
expect this stuff.

Alan.

From pje at telecommunity.com  Thu Apr  2 03:54:38 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Wed, 01 Apr 2009 21:54:38 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904011611r1cd264ej5d7fa5c3a7377a4c@mail.gmail.com
 >
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
	<4a951aa00904011553g53cb8ca2pbe1f98869c7949cb@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C587D5@ex10.hostedexchange.local>
	<88e286470904011611r1cd264ej5d7fa5c3a7377a4c@mail.gmail.com>
Message-ID: <20090402015213.DCADE3A40A7@sparrow.telecommunity.com>

At 10:11 AM 4/2/2009 +1100, Graham Dumpleton wrote:
>Seems to me that you can't have it be configurable and it must always
>be latin-1 interpretation. The problem is where you are composing
>multiple WSGI applications. If they each have different expectations
>or requirements as to how it is handled, aren't you going to have a
>problem.

Agreed.  Configuration and duplication are both evil in this context.


From janssen at parc.com  Thu Apr  2 04:00:53 2009
From: janssen at parc.com (Bill Janssen)
Date: Wed, 1 Apr 2009 19:00:53 PDT
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
Message-ID: <91243.1238637653@parc.com>

Alan Kennedy <alan at xhaus.com> wrote:

> Hi Bill,
> 
> [Bill]
> > I think the controlling reference here is RFC 3875.
> 
> I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.

I see what you're saying, but it's darn near impossible, as a practical
matter, to get any guidance on encoding matters from those.

The question is where those names come from, and they come from CGI, and
that is (practically speaking) defined these days by RFC 3875, as much as
anything.

> I think the question is "are people using IRIs in the wild"? If so,
> then we must decide how do we best deal with the problems of
> recognising iso-8859-1+rfc2037 versus utf-8, or whatever
> server-configured encoding the user has chosen.

See http://bugs.python.org/issue3300, where we went around and around
that question.  The answer seems to be, yes.

There are lots of useful fragments in that discussion, for instance:

``For the authority (server name) portion of a URI, RFC 3986 is
pretty clear that UTF-8 must be used for non-ASCII values (assuming, for
a moment, that IDNA addresses are not Punycode encoded already). For
the path portion of URIs, a large-ish proportion of them are, indeed,
UTF-8 encoded because that has been the de facto standard in Web browsers
for a number of years now. For the query and fragment parts, however,
the encoding is determined by context and often depends on the encoding
of some page that contains the form from which the data is taken. Thus,
a large number of URIs contain non-UTF-8 percent-encoded octets.''

Bill

From graham.dumpleton at gmail.com  Thu Apr  2 06:01:17 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 2 Apr 2009 15:01:17 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <91243.1238637653@parc.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
Message-ID: <88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>

2009/4/2 Bill Janssen <janssen at parc.com>:
> Alan Kennedy <alan at xhaus.com> wrote:
>
>> Hi Bill,
>>
>> [Bill]
>> > I think the controlling reference here is RFC 3875.
>>
>> I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.
>
> I see what you're saying, but it's darn near impossible, as a practical
> matter, to get any guidance on encoding matters from those.
>
> The question is where those names come from, and they come from CGI, and
> that is (practically speaking) defined these days by RFC 3875, as much as
> anything.
>
>> I think the question is "are people using IRIs in the wild"? If so,
>> then we must decide how do we best deal with the problems of
>> recognising iso-8859-1+rfc2037 versus utf-8, or whatever
>> server-configured encoding the user has chosen.
>
> See http://bugs.python.org/issue3300, where we went around and around
> that question. ?The answer seems to be, yes.
>
> There are lots of useful fragments in that discussion, for instance:
>
> ``For the authority (server name) portion of a URI, RFC 3986 is
> pretty clear that UTF-8 must be used for non-ASCII values (assuming, for
> a moment, that IDNA addresses are not Punycode encoded already). For
> the path portion of URIs, a large-ish proportion of them are, indeed,
> UTF-8 encoded because that has been the de facto standard in Web browsers
> for a number of years now. For the query and fragment parts, however,
> the encoding is determined by context and often depends on the encoding
> of some page that contains the form from which the data is taken. Thus,
> a large number of URIs contain non-UTF-8 percent-encoded octets.''

Reading that bug detail (very long), reminds me of another sticky
issue that was brought up before which is the Referrer (request) and
Location (response) headers. These being URLs means you have to deal
with the issue of encoding in the URL within a header.

Is there going to be any simple answer to all of this? :-(

Graham

From sh at defuze.org  Thu Apr  2 08:56:59 2009
From: sh at defuze.org (Sylvain Hellegouarch)
Date: Thu, 2 Apr 2009 08:56:59 +0200 (CEST)
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
Message-ID: <54714.193.253.216.132.1238655419.squirrel@mail1.webfaction.com>

Hi All,

> Is there going to be any simple answer to all of this? :-(
>

Would there be any interest in asking the HTTP-BIS working group [1] what
they think about it?

Currently I couldn't find anything in their drafts suggesting they had
decided to clarify this issue from a protocol's perspective but they might
consider it to be relevant to their goals.

- Sylvain

[1] http://www.ietf.org/html.charters/httpbis-charter.html

-- 
Sylvain Hellegouarch
http://www.defuze.org

From alan at xhaus.com  Thu Apr  2 13:19:34 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Thu, 2 Apr 2009 06:19:34 -0500
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <54714.193.253.216.132.1238655419.squirrel@mail1.webfaction.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<54714.193.253.216.132.1238655419.squirrel@mail1.webfaction.com>
Message-ID: <4a951aa00904020419pe98287ds9443f3bb32c03f27@mail.gmail.com>

[Sylvain]
> Would there be any interest in asking the HTTP-BIS working group [1] what
> they think about it?
>
> Currently I couldn't find anything in their drafts suggesting they had
> decided to clarify this issue from a protocol's perspective but they might
> consider it to be relevant to their goals.
>
> - Sylvain
>
> [1] http://www.ietf.org/html.charters/httpbis-charter.html

I checked the current version of their replacement for RFC 2616. It says

"""
2.1.3.  URI Comparison

   When comparing two URIs to decide if they match or not, a client
   SHOULD use a case-sensitive octet-by-octet comparison of the entire
   URIs
"""

Which doesn't work if the two URIs to be compared are in different encodings.

I did find this page on the W3C site which at least explains the
issues, and does a survey of existing modern browsers for how they
encode URIs and IRIs.

http://www.w3.org/International/articles/idn-and-iri/

"""
Paths

The conversion process for parts of the IRI relating to the path is
already supported natively in the latest versions of IE7, Firefox,
Opera, Safari and Google Chrome.

It works in Internet Explorer 6 if the option in Tools>Internet
Options>Advanced>Always send URLs as UTF-8 is turned on. This means
that links in HTML, or addresses typed into the browser's address bar
will be correctly converted in those user agents. It doesn't work out
of the box for Firefox 2 (although you may obtain results if the IRI
and the resource name are in the same encoding), but technically-aware
users can turn on an option to support this (set
network.standard-url.encode-utf8 to true in about:config).

Whether or not the resource is found on the server, however, is a
different question. If the file system is in UTF-8, there should be no
problem. If not, and no mechanism is available to convert addresses
from UTF-8 to the appropriate encoding, the request will fail.

Files are normally exposed as UTF-8 by servers such as IIS and Apache
2 on Windows and Mac OS X. Unix and Linux users can store file names
in UTF-8, or use the mod_fileiri module mentioned earlier. Version 1
of the Apache server doesn't yet expose filenames as UTF-8.

You can run a basic check whether it works for your client and
resource using this simple test.

Note that, while the basics may work, there are other somewhat more
complicated aspects of IRI support, such as handling of bidirectional
text in Arabic or Hebrew, which may need some additional time for full
implementation.
"""

Alan.

From graham.dumpleton at gmail.com  Thu Apr  2 13:33:07 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 2 Apr 2009 22:33:07 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
Message-ID: <88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>

2009/4/2 Graham Dumpleton <graham.dumpleton at gmail.com>:
> Is there going to be any simple answer to all of this? :-(

I am slowly working through what I think I at least need to do for
Apache/mod_wsgi. I'll give a summary of what I have worked out so far
based on the discussions and my own research.

Just so I have a list of things to check off, I include an example
WSGI environment from a request and make comments about each category
of things from it.

First off is CGI HTTP variables.

HTTP_ACCEPT: 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'
HTTP_ACCEPT_ENCODING: 'gzip, deflate'
HTTP_ACCEPT_LANGUAGE: 'en-us'
HTTP_CONNECTION: 'keep-alive'
HTTP_HOST: 'home.dscpl.com.au'
HTTP_USER_AGENT: 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6;
en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1
Safari/525.27.1'

The rule here from WSGI 1.0 amendments page in relation to Python 3.0 is:

"""When running under Python 3, servers MUST provide CGI HTTP
variables as strings, decoded from the headers using HTTP standard
encodings (i.e. latin-1 + RFC 2047)"""

Which is fair enough and basically what the RFCs say. At the moment I
don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so just
need to do that.

An interesting one here to note is HTTP_HOST. The issue with this one
is what would happen for a unicode host name. For Apache an IDNA
(RFC3490) encoded host name has to be used to identify a site with
unicode host name. That is, one uses the IDNA name for ServerName or
ServerAlias directives.

When one gets a request one would actually see the IDNA name for
HTTP_HOST and that only uses latin-1 characters. For example:

  HTTP_HOST: 'xn--wgbe9chb01aytce.com'

These resolve in DNS okay:

  $ nslookup xn--wgbe9chb01aytce.com
  Server:		192.168.1.254
  Address:	192.168.1.254#53

  Non-authoritative answer:
  Name:	xn--wgbe9chb01aytce.com
  Address: 208.78.242.184

Using HTTP live headers on Firefox can also confirm that that is what
would be sent:

  Host: xn--wgbe9chb01aytce.com

My understanding is that if a actual unicode string is given to a
browser, that it should translate it to the IDNA name before use.

Next HTTP header to worry about is HTTP_REFERRER.

There would be two parts to this, there would be the host name
component and then the path component.

We already know from above that for unicode host name it should be the
IDNA name.

For the path component, if the client follows the rules properly, then
if the path uses a non latin-1 encoding, then it should be using RFC
2047 to indicate this so shouldn't have to do anything different and
use same rule as other HTTP headers. For this header we are actually
in a better situation that for URL in actual HTTP request line which
isn't so specific about encodings.

GATEWAY_INTERFACE: 'CGI/1.1'
SERVER_PROTOCOL: 'HTTP/1.1'

Standard stuff which is always going to be latin-1, so encode as that.

REMOTE_ADDR: '192.168.1.5'
REMOTE_PORT: '51378'
SERVER_PORT: '80'
SERVER_ADDR: '192.168.1.5'

Again, latin-1 is okay.

SERVER_SOFTWARE: 'Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.7l
DAV/2 mod_wsgi/3.0-TRUNK Python/2.5.1'

Again, latin-1 is okay as Apache modules internally can only supply
normal C strings to add stuff to this.

SERVER_NAME: 'home.dscpl.com.au'

Same as HTTP_HOST and if a unicode host name would be IDNA encoded, so
can use latin-1 okay.

SERVER_ADMIN: 'you at example.com'

This is set by ServerAdmin directive. Because in Apache configuration
is effectively latin-1, probably can't even define a non latin-1 email
address. For host part, probably IDNA encoded anyway, so restriction
on latin-1 only perhaps pertinent to user part of email address. So,
latin-1 should be okay.

SERVER_SIGNATURE: ''

Depending on Apache configuration can be server name and version
information or server admin email address. All latin-1.

DOCUMENT_ROOT: '/Library/WebServer/Documents'
SCRIPT_FILENAME: '/Users/grahamd/Sites/echo.wsgi'

These are file system paths, and since the Apache Runtime Library used
for Apache 2.X has a define for whether file system supports unicode,
can say:

  #if APR_HAS_UNICODE_FS
        charset = "UTF-8";
  #else
        charset = "ISO-8859-1";
  #endif

For Apache 1.3, which doesn't have that define AFAIK, might just have
to assume latin-1, but possibly another way of doing it, or Apache 1.3
might have its own define for it.

PATH: '/usr/bin:/bin:/usr/sbin:/sbin'

Presume I can use APR_HAS_UNICODE_FS check again even though it is a
combination of paths.

REQUEST_METHOD: 'GET'

Presume they will always use latin-1 for these.

All that is now left is the following, which we have already been discussing.

REQUEST_URI: '/~grahamd/echo.wsgi'
SCRIPT_NAME: '/~grahamd/echo.wsgi'
PATH_INFO: ''
QUERY_STRING: ''

At least I am happy that except for these four, that there shouldn't
be any issues.

I'll keep watching what others come up with in respect of these and
see what consensus develops. :-)

Graham

From alan at xhaus.com  Thu Apr  2 14:49:08 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Thu, 2 Apr 2009 07:49:08 -0500
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <54714.193.253.216.132.1238655419.squirrel@mail1.webfaction.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<54714.193.253.216.132.1238655419.squirrel@mail1.webfaction.com>
Message-ID: <4a951aa00904020549q30c3f088ja00b27cc0b9dd4c8@mail.gmail.com>

[Sylvain]
> Would there be any interest in asking the HTTP-BIS working group [1] what
> they think about it?
>
> Currently I couldn't find anything in their drafts suggesting they had
> decided to clarify this issue from a protocol's perspective but they might
> consider it to be relevant to their goals.
>
> - Sylvain
>
> [1] http://www.ietf.org/html.charters/httpbis-charter.html

As mentioned in an earlier post, I think their current spec avoids the
issue, by still relying on "octet-by-octet" comparison.

But I did come across this discussion on their list, which goes into
all of the issues in fine detail.

http://www.nabble.com/PROPOSAL%3A-i74%3A-Encoding-for-non-ASCII-headers-tt16274487.html#a16291951

Quote of the thread

[Roy Fielding]
> We are simply passing through the one and only defined i18n solution
> for HTTP/1.1 because it was the only solution available in 1994.
> If email clients can (and do) implement it, then so can WWW clients.
>
> People who want to fix that should start queueing for HTTP/1.2.

Alan.

From foom at fuhm.net  Thu Apr  2 17:45:45 2009
From: foom at fuhm.net (James Y Knight)
Date: Thu, 2 Apr 2009 11:45:45 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
Message-ID: <F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>

On Apr 2, 2009, at 7:33 AM, Graham Dumpleton wrote:

> """When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047)"""
>
> Which is fair enough and basically what the RFCs say. At the moment I
> don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so just
> need to do that.

I'd really *really* like to recommend that any mention of RFC 2047 is  
stricken from the WSGI server requirements. I cannot imagine that  
decoding actually accomplishing anything other than opening security  
holes (think a filter in an upstream proxy that doesn't know how to do  
2047-decoding passing something through that you now decode.)

Also, you have to only do the decoding on TEXT words according to the  
spec, so the WSGI container now needs an HTTP header parser just in  
order to determine where it should decode RFC2047 words and where not  
to? I don't think so...

James


From tseaver at palladion.com  Thu Apr  2 19:36:53 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 02 Apr 2009 13:36:53 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>	<86217.1238608796@parc.com>	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>	<91243.1238637653@parc.com>	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
Message-ID: <gr2t3u$c72$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Graham Dumpleton wrote:
> 2009/4/2 Graham Dumpleton <graham.dumpleton at gmail.com>:
>> Is there going to be any simple answer to all of this? :-(
> 
> I am slowly working through what I think I at least need to do for
> Apache/mod_wsgi. I'll give a summary of what I have worked out so far
> based on the discussions and my own research.
> 
> Just so I have a list of things to check off, I include an example
> WSGI environment from a request and make comments about each category
> of things from it.
> 
> First off is CGI HTTP variables.
> 
> HTTP_ACCEPT: 'text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5'
> HTTP_ACCEPT_ENCODING: 'gzip, deflate'
> HTTP_ACCEPT_LANGUAGE: 'en-us'
> HTTP_CONNECTION: 'keep-alive'
> HTTP_HOST: 'home.dscpl.com.au'
> HTTP_USER_AGENT: 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_6;
> en-us) AppleWebKit/525.27.1 (KHTML, like Gecko) Version/3.2.1
> Safari/525.27.1'
> 
> The rule here from WSGI 1.0 amendments page in relation to Python 3.0 is:
> 
> """When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047)"""
> 
> Which is fair enough and basically what the RFCs say. At the moment I
> don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so just
> need to do that.
> 
> An interesting one here to note is HTTP_HOST. The issue with this one
> is what would happen for a unicode host name. For Apache an IDNA
> (RFC3490) encoded host name has to be used to identify a site with
> unicode host name. That is, one uses the IDNA name for ServerName or
> ServerAlias directives.
> 
> When one gets a request one would actually see the IDNA name for
> HTTP_HOST and that only uses latin-1 characters. For example:
> 
>   HTTP_HOST: 'xn--wgbe9chb01aytce.com'
> 
> These resolve in DNS okay:
> 
>   $ nslookup xn--wgbe9chb01aytce.com
>   Server:		192.168.1.254
>   Address:	192.168.1.254#53
> 
>   Non-authoritative answer:
>   Name:	xn--wgbe9chb01aytce.com
>   Address: 208.78.242.184
> 
> Using HTTP live headers on Firefox can also confirm that that is what
> would be sent:
> 
>   Host: xn--wgbe9chb01aytce.com
> 
> My understanding is that if a actual unicode string is given to a
> browser, that it should translate it to the IDNA name before use.

That is what the RFCs require, as well as the fact that un-encoded
unicode can't be written onto a socket.

> Next HTTP header to worry about is HTTP_REFERRER.
> 
> There would be two parts to this, there would be the host name
> component and then the path component.
> 
> We already know from above that for unicode host name it should be the
> IDNA name.
> 
> For the path component, if the client follows the rules properly, then
> if the path uses a non latin-1 encoding, then it should be using RFC
> 2047 to indicate this so shouldn't have to do anything different and
> use same rule as other HTTP headers. For this header we are actually
> in a better situation that for URL in actual HTTP request line which
> isn't so specific about encodings.
> 
> GATEWAY_INTERFACE: 'CGI/1.1'
> SERVER_PROTOCOL: 'HTTP/1.1'
> 
> Standard stuff which is always going to be latin-1, so encode as that.

I think you mean 'decode' here?  Unicode strings are encode to get
bytes;  bytes are decoded to get unicode strings.

Also, I don't know of any reason why those values can be anything but ASCII.

> REMOTE_ADDR: '192.168.1.5'
> REMOTE_PORT: '51378'
> SERVER_PORT: '80'
> SERVER_ADDR: '192.168.1.5'
> 
> Again, latin-1 is okay.

Likewise, these can't be anything but ASCII.

> SERVER_SOFTWARE: 'Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.7l
> DAV/2 mod_wsgi/3.0-TRUNK Python/2.5.1'
> 
> Again, latin-1 is okay as Apache modules internally can only supply
> normal C strings to add stuff to this.
> 
> SERVER_NAME: 'home.dscpl.com.au'
> 
> Same as HTTP_HOST and if a unicode host name would be IDNA encoded, so
> can use latin-1 okay.
> 
> SERVER_ADMIN: 'you at example.com'
> 
> This is set by ServerAdmin directive. Because in Apache configuration
> is effectively latin-1, probably can't even define a non latin-1 email
> address. For host part, probably IDNA encoded anyway, so restriction
> on latin-1 only perhaps pertinent to user part of email address. So,
> latin-1 should be okay.
> 
> SERVER_SIGNATURE: ''
> 
> Depending on Apache configuration can be server name and version
> information or server admin email address. All latin-1.
> 
> DOCUMENT_ROOT: '/Library/WebServer/Documents'
> SCRIPT_FILENAME: '/Users/grahamd/Sites/echo.wsgi'
> 
> These are file system paths, and since the Apache Runtime Library used
> for Apache 2.X has a define for whether file system supports unicode,
> can say:
> 
>   #if APR_HAS_UNICODE_FS
>         charset = "UTF-8";
>   #else
>         charset = "ISO-8859-1";
>   #endif

I'm not sure that works for arbitrary filesystem configurations:  some
parts of the tree may be mounted from locations with different
encodings.  See David Wheeler's analysis for more:

 http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

> For Apache 1.3, which doesn't have that define AFAIK, might just have
> to assume latin-1, but possibly another way of doing it, or Apache 1.3
> might have its own define for it.
> 
> PATH: '/usr/bin:/bin:/usr/sbin:/sbin'
> 
> Presume I can use APR_HAS_UNICODE_FS check again even though it is a
> combination of paths.
> 
> REQUEST_METHOD: 'GET'
> 
> Presume they will always use latin-1 for these.

RFC 2616, section 5.1.1 defines only ASCII methods;  extension methods
are 'tokens', which must also be printable ASCII w/o separateros
(section 2.2).

> All that is now left is the following, which we have already been discussing.
> 
> REQUEST_URI: '/~grahamd/echo.wsgi'
> SCRIPT_NAME: '/~grahamd/echo.wsgi'
> PATH_INFO: ''
> QUERY_STRING: ''
> 
> At least I am happy that except for these four, that there shouldn't
> be any issues.
> 
> I'll keep watching what others come up with in respect of these and
> see what consensus develops. :-)


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ1Pe1+gerLs4ltQ4RArt6AJ9GMmvjQd6LfH4MSC1yzNUTO6r51ACg3Ocl
3bOgMrQUlFy+ZSehv8gsSLM=
=r4vt
-----END PGP SIGNATURE-----


From tseaver at palladion.com  Thu Apr  2 19:40:38 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 02 Apr 2009 13:40:38 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>	<86217.1238608796@parc.com>	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>	<91243.1238637653@parc.com>	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
	<F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>
Message-ID: <49D4F896.5010302@palladion.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James Y Knight wrote:
> On Apr 2, 2009, at 7:33 AM, Graham Dumpleton wrote:
> 
>> """When running under Python 3, servers MUST provide CGI HTTP
>> variables as strings, decoded from the headers using HTTP standard
>> encodings (i.e. latin-1 + RFC 2047)"""
>>
>> Which is fair enough and basically what the RFCs say. At the moment I
>> don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so just
>> need to do that.
> 
> I'd really *really* like to recommend that any mention of RFC 2047 is  
> stricken from the WSGI server requirements. I cannot imagine that  
> decoding actually accomplishing anything other than opening security  
> holes (think a filter in an upstream proxy that doesn't know how to do  
> 2047-decoding passing something through that you now decode.)
> 
> Also, you have to only do the decoding on TEXT words according to the  
> spec, so the WSGI container now needs an HTTP header parser just in  
> order to determine where it should decode RFC2047 words and where not  
> to? I don't think so...

Couldn't the spec mandate that decoding RFC 2047 headers is the
responsibility of the non-middleware WSGI server?  I agree that
middleware and applications shouldn't know ore care about that problem.
 Under Python 2.x, the server would transcode those values to the
"common" encoding used for all values in the WSGI environment;  under
Python 3.x, it would just decode them to unicode.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ1PiW+gerLs4ltQ4RAhUmAJ94N6nC+Lh5qPX2Zrz2zAmZgZlnPgCfVZYU
Z0xaYW6NwFJ35Xa11HRXuDw=
=w/6q
-----END PGP SIGNATURE-----


From foom at fuhm.net  Thu Apr  2 20:09:19 2009
From: foom at fuhm.net (James Y Knight)
Date: Thu, 2 Apr 2009 14:09:19 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <49D4F896.5010302@palladion.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>	<86217.1238608796@parc.com>	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>	<91243.1238637653@parc.com>	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
	<F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>
	<49D4F896.5010302@palladion.com>
Message-ID: <982981EC-02C1-4C85-AAC9-3CAA0D3721E9@fuhm.net>


On Apr 2, 2009, at 1:40 PM, Tres Seaver wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> James Y Knight wrote:
>> On Apr 2, 2009, at 7:33 AM, Graham Dumpleton wrote:
>>
>>> """When running under Python 3, servers MUST provide CGI HTTP
>>> variables as strings, decoded from the headers using HTTP standard
>>> encodings (i.e. latin-1 + RFC 2047)"""
>>>
>>> Which is fair enough and basically what the RFCs say. At the  
>>> moment I
>>> don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so  
>>> just
>>> need to do that.
>>
>> I'd really *really* like to recommend that any mention of RFC 2047 is
>> stricken from the WSGI server requirements. I cannot imagine that
>> decoding actually accomplishing anything other than opening security
>> holes (think a filter in an upstream proxy that doesn't know how to  
>> do
>> 2047-decoding passing something through that you now decode.)
>>
>> Also, you have to only do the decoding on TEXT words according to the
>> spec, so the WSGI container now needs an HTTP header parser just in
>> order to determine where it should decode RFC2047 words and where not
>> to? I don't think so...
>
> Couldn't the spec mandate that decoding RFC 2047 headers is the
> responsibility of the non-middleware WSGI server?  I agree that
> middleware and applications shouldn't know ore care about that  
> problem.
> Under Python 2.x, the server would transcode those values to the
> "common" encoding used for all values in the WSGI environment;  under
> Python 3.x, it would just decode them to unicode.
>

I think you're saying you agree with exactly the opposite of what I  
meant. The server/gateway (aka apache mod_wsgi) *must not* be required  
to handle RFC2047 decoding. Only the application (or a header parsing  
library that the application uses) can possibly handle this properly.

That's why I think it should not be mentioned at all in the WSGI  
requirements for the server.

Furthermore, although they certainly can if they want, I'd recommend  
that no applications actually bother with doing such decoding, since  
RFC2047 words in http headers are essentially never used.

james

From tseaver at palladion.com  Thu Apr  2 20:33:08 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 02 Apr 2009 14:33:08 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <982981EC-02C1-4C85-AAC9-3CAA0D3721E9@fuhm.net>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>	<86217.1238608796@parc.com>	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>	<91243.1238637653@parc.com>	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>	<F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>	<49D4F896.5010302@palladion.com>
	<982981EC-02C1-4C85-AAC9-3CAA0D3721E9@fuhm.net>
Message-ID: <gr30dd$odj$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James Y Knight wrote:

> I think you're saying you agree with exactly the opposite of what I
> meant. The server/gateway (aka apache mod_wsgi) *must not* be required
> to handle RFC2047 decoding. Only the application (or a header parsing
> library that the application uses) can possibly handle this properly.

I don't understand why:  if RFC2047 values are being passed as HTTP
headers, then surely the server has enough information to decode them,
and to ensure that they are re-encoded into the same encoding as all
other WSGI enviornment variables (under Python 2.x).

Ensuring that the enviornment variables are uniformly encoded (or
decoded to unicode, in Python 3.x) seems like it *must* be the server's
responsiblity:  only the server can know how some values (e.g., those
derived from filesystem paths, or its config file) are encoded.  Moving
that responsibility to the application just means that it won't be met,
because the application won't have enough information to do the job.

> That's why I think it should not be mentioned at all in the WSGI
> requirements for the server.
>
> Furthermore, although they certainly can if they want, I'd recommend
> that no applications actually bother with doing such decoding, since
> RFC2047 words in http headers are essentially never used.

In that case, it would be moot whether the server or the application
does (not) do the decoding.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ1QTk+gerLs4ltQ4RApDgAJ4olI0e3Jh1diP9P6se5RR3mfFFIACaA05t
n8UK1XWG2ibMTiqXEeGr6mw=
=JNXk
-----END PGP SIGNATURE-----


From foom at fuhm.net  Thu Apr  2 21:56:44 2009
From: foom at fuhm.net (James Y Knight)
Date: Thu, 2 Apr 2009 15:56:44 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <gr30dd$odj$1@ger.gmane.org>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>	<86217.1238608796@parc.com>	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>	<91243.1238637653@parc.com>	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>	<F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>	<49D4F896.5010302@palladion.com>
	<982981EC-02C1-4C85-AAC9-3CAA0D3721E9@fuhm.net>
	<gr30dd$odj$1@ger.gmane.org>
Message-ID: <EF472F9D-235A-4D9A-86C8-1BA20D576102@fuhm.net>

On Apr 2, 2009, at 2:33 PM, Tres Seaver wrote:
> I don't understand why:  if RFC2047 values are being passed as HTTP
> headers, then surely the server has enough information to decode them,
> and to ensure that they are re-encoded into the same encoding as all
> other WSGI enviornment variables (under Python 2.x).

Just so long as the gateway server has an HTTP header parsing  
implementation and global knowledge of all HTTP headers, including  
private ones.

Consider:
FooBar: =?utf-8?q?some-text?=

Should that be decoded with RFC2047 rules? Answer: it depends. Does  
the grammar for FooBar say that the contents is of type TEXT? Maybe it  
just *looks* like an encoded-word but is actually just a sequence of  
tokens and separators which have an entirely different meaning for  
that header. You simply can't tell without the grammar for the FooBar  
header.

James

From tseaver at palladion.com  Thu Apr  2 23:08:54 2009
From: tseaver at palladion.com (Tres Seaver)
Date: Thu, 02 Apr 2009 17:08:54 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <EF472F9D-235A-4D9A-86C8-1BA20D576102@fuhm.net>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>	<86217.1238608796@parc.com>	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>	<91243.1238637653@parc.com>	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>	<F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>	<49D4F896.5010302@palladion.com>	<982981EC-02C1-4C85-AAC9-3CAA0D3721E9@fuhm.net>	<gr30dd$odj$1@ger.gmane.org>
	<EF472F9D-235A-4D9A-86C8-1BA20D576102@fuhm.net>
Message-ID: <gr39hg$mll$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

James Y Knight wrote:
> On Apr 2, 2009, at 2:33 PM, Tres Seaver wrote:
>> I don't understand why:  if RFC2047 values are being passed as HTTP
>> headers, then surely the server has enough information to decode them,
>> and to ensure that they are re-encoded into the same encoding as all
>> other WSGI enviornment variables (under Python 2.x).
> 
> Just so long as the gateway server has an HTTP header parsing  
> implementation and global knowledge of all HTTP headers, including  
> private ones.
> 
> Consider:
> FooBar: =?utf-8?q?some-text?=
> 
> Should that be decoded with RFC2047 rules? Answer: it depends. Does  
> the grammar for FooBar say that the contents is of type TEXT? Maybe it  
> just *looks* like an encoded-word but is actually just a sequence of  
> tokens and separators which have an entirely different meaning for  
> that header. You simply can't tell without the grammar for the FooBar  
> header.

A couple of things:

- - That header is not even allowed by the HTTP RFC's, AFAIK.  "Custom"
  headers need the 'X-' prefix.

- - I could imagine a server option which disabled decoding for a specific
  subset of custom headers, but can't imagine needing it in any real
  application.

- - Leaving the WSGI environment a hodgepodge of differently-encoded
  junk makes *every* application have to deal with that stuff.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ1Slm+gerLs4ltQ4RAtkvAJ9SM8P9YmB/D3JleoY/0C7kVMl5MgCbBMCb
+YavShebeoJU5Ijjc394LCQ=
=BuI3
-----END PGP SIGNATURE-----


From graham.dumpleton at gmail.com  Fri Apr  3 00:27:21 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 3 Apr 2009 09:27:21 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <982981EC-02C1-4C85-AAC9-3CAA0D3721E9@fuhm.net>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
	<F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>
	<49D4F896.5010302@palladion.com>
	<982981EC-02C1-4C85-AAC9-3CAA0D3721E9@fuhm.net>
Message-ID: <88e286470904021527o2fec14d1sae956379cd2e057a@mail.gmail.com>

2009/4/3 James Y Knight <foom at fuhm.net>:
>
> On Apr 2, 2009, at 1:40 PM, Tres Seaver wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> James Y Knight wrote:
>>>
>>> On Apr 2, 2009, at 7:33 AM, Graham Dumpleton wrote:
>>>
>>>> """When running under Python 3, servers MUST provide CGI HTTP
>>>> variables as strings, decoded from the headers using HTTP standard
>>>> encodings (i.e. latin-1 + RFC 2047)"""
>>>>
>>>> Which is fair enough and basically what the RFCs say. At the moment I
>>>> don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so just
>>>> need to do that.
>>>
>>> I'd really *really* like to recommend that any mention of RFC 2047 is
>>> stricken from the WSGI server requirements. I cannot imagine that
>>> decoding actually accomplishing anything other than opening security
>>> holes (think a filter in an upstream proxy that doesn't know how to do
>>> 2047-decoding passing something through that you now decode.)
>>>
>>> Also, you have to only do the decoding on TEXT words according to the
>>> spec, so the WSGI container now needs an HTTP header parser just in
>>> order to determine where it should decode RFC2047 words and where not
>>> to? I don't think so...
>>
>> Couldn't the spec mandate that decoding RFC 2047 headers is the
>> responsibility of the non-middleware WSGI server? ?I agree that
>> middleware and applications shouldn't know ore care about that problem.
>> Under Python 2.x, the server would transcode those values to the
>> "common" encoding used for all values in the WSGI environment; ?under
>> Python 3.x, it would just decode them to unicode.
>>
>
> I think you're saying you agree with exactly the opposite of what I meant.
> The server/gateway (aka apache mod_wsgi) *must not* be required to handle
> RFC2047 decoding. Only the application (or a header parsing library that the
> application uses) can possibly handle this properly.
>
> That's why I think it should not be mentioned at all in the WSGI
> requirements for the server.
>
> Furthermore, although they certainly can if they want, I'd recommend that no
> applications actually bother with doing such decoding, since RFC2047 words
> in http headers are essentially never used.

Having the WSGI adapter ignore it would be fine by me, as it then
effectively mirrors the current behaviour of Python 2.X. That is, in
Python 2.X the WSGI application would have to deal with them anyway.

If RFC2047 comes into play in response headers as well, then also the
WSGI application's responsibility there given that it should be
returning bytes for response headers and so would therefore have had
to apply such an encoding if necessary anyway.

For WSGI 1.0 and Python 3.0 can therefore possibly maintain the status
quo, or as close as possible, with Python 2.X behaviour. If we want to
think about changing it, then address it in WSGI 2.0 where more
significant changes being made anyway. Better that than for WSGI 1.0
and Python 2.X and Python 3.0 having different requirements.

Graham

From graham.dumpleton at gmail.com  Fri Apr  3 00:34:13 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 3 Apr 2009 09:34:13 +1100
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <gr2t3u$c72$1@ger.gmane.org>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
	<gr2t3u$c72$1@ger.gmane.org>
Message-ID: <88e286470904021534h11986b4bj90a6da309a67530e@mail.gmail.com>

2009/4/3 Tres Seaver <tseaver at palladion.com>:
>> DOCUMENT_ROOT: '/Library/WebServer/Documents'
>> SCRIPT_FILENAME: '/Users/grahamd/Sites/echo.wsgi'
>>
>> These are file system paths, and since the Apache Runtime Library used
>> for Apache 2.X has a define for whether file system supports unicode,
>> can say:
>>
>> ? #if APR_HAS_UNICODE_FS
>> ? ? ? ? charset = "UTF-8";
>> ? #else
>> ? ? ? ? charset = "ISO-8859-1";
>> ? #endif
>
> I'm not sure that works for arbitrary filesystem configurations: ?some
> parts of the tree may be mounted from locations with different
> encodings. ?See David Wheeler's analysis for more:
>
> ?http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

Yes, am aware that it isn't that simple. I can make that the default
and like I have a configuration directive for case sensitivity in file
systems:

  http://code.google.com/p/modwsgi/wiki/ConfigurationDirectives#WSGICaseSensitivity

I can add one related to file system encoding. This would be similar
to how some other Apache modules allow overriding it. For example:

  http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#proxyftpdircharset

Graham

From fumanchu at aminus.org  Fri Apr  3 03:49:49 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Thu, 2 Apr 2009 18:49:49 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <91243.1238637653@parc.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com><F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local><ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com><86217.1238608796@parc.com><4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407CC5F07@ex10.hostedexchange.local>

Bill Janssen wrote:
> Alan Kennedy <alan at xhaus.com> wrote:
> > [Bill]
> > > I think the controlling reference here is RFC 3875.
> >
> > I think the controlling references are RFC 2616, RFC 2396
> > and RFC 3987.
> 
> I see what you're saying, but it's darn near impossible, as a
practical
> matter, to get any guidance on encoding matters from those.
> 
> The question is where those names come from, and they come from CGI,
> and that is (practically speaking) defined these days by RFC 3875,
> as much as anything.

If so, then PEP 333 really should be updated to point at a version of
the CGI "spec" that doesn't reference e.g. RFC 1808 for URI's. As it is,
one could easily come to the conclusion that, for example, path
parameters like /path;a=3 aren't supported (because the CGI draft that
PEP 333 mentions disallows them). I'd be much happier referring to 3875,
and even happier diverging from strict compliance to what was always a
shaky spec.


Robert Brewer
fumanchu at aminus.org


From pje at telecommunity.com  Fri Apr  3 17:16:13 2009
From: pje at telecommunity.com (P.J. Eby)
Date: Fri, 03 Apr 2009 11:16:13 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.co
 m>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<b654cd2e0904011215s1962b397u36c91a63512de669@mail.gmail.com>
	<ca471dc20904011309t28c286fdgf60d88fad3272d15@mail.gmail.com>
	<88e286470904011351l4952262bt6fbf72ef0557ca16@mail.gmail.com>
Message-ID: <20090403151347.B40613A40B0@sparrow.telecommunity.com>

At 07:51 AM 4/2/2009 +1100, Graham Dumpleton wrote:
>If we are going to carry values in two different formats,

Let's try not to do that, either.  ;-)


From fumanchu at aminus.org  Fri Apr  3 20:35:20 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Fri, 3 Apr 2009 11:35:20 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com><F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local><ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com><86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>

Alan Kennedy wrote:
> [Bill]
> > I think the controlling reference here is RFC 3875.
> 
> I think the controlling references are RFC 2616, RFC 2396 and RFC
3987.
> 
> RFC 2616, the HTTP 1.1 spec, punts on the question of character
> encoding for the request URI.
> 
> RFC 2396, the URI spec, says
> 
> """
>    It is expected that a systematic treatment of character encoding
>    within URI will be developed as a future modification of this
>    specification.
> """
> 
> RFC 3987 is that spec, for Internationalized Resource Identifiers. It
> says
> 
> """
> An IRI is a sequence of characters from the Universal Character Set
> (Unicode/ISO 10646).
> """
> 
> and
> 
> """
> 1.2.  Applicability
> 
>    IRIs are designed to be compatible with recommendations for new URI
>    schemes [RFC2718].  The compatibility is provided by specifying a
>    well-defined and deterministic mapping from the IRI character
>    sequence to the functionally equivalent URI character sequence.
>    Practical use of IRIs (or IRI references) in place of URIs (or URI
>    references) depends on the following conditions being met:
> """
> 
> followed by
> 
> """
>    c.  The URI corresponding to the IRI in question has to encode
>        original characters into octets using UTF-8.  For new URI
>        schemes, this is recommended in [RFC2718].  It can apply to a
>        whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384],
>        or the URN syntax [RFC2141]).  It can apply to a specific part
> of
>        a URI, such as the fragment identifier (e.g., [XPointer]).  It
>        can apply to a specific URI or part(s) thereof.  For details,
>        please see section 6.4.
> """
> 
> I think the question is "are people using IRIs in the wild"? If so,
> then we must decide how do we best deal with the problems of
> recognising iso-8859-1+rfc2037 versus utf-8, or whatever
> server-configured encoding the user has chosen.

Agreed. The Request-URI needs to handle IRI's. The headers mostly
don't--almost all headers are of mostly type "token", which is US-ASCII.
A few are of type "TEXT", which is ISO-8859-1/RFC 2047. The remaining
(sub)values are mostly custom byte sequences:

field-name           field-value
----------           -----------
Accept               token
Accept-Charset       token
Accept-Encoding      token
Accept-Language      ALPHA, plus ":", "=", "q" etc
Accept-Ranges        token
Age                  DIGIT
Allow                token
Authorization        token
Cache-Control        token
Connection           token
Content-Encoding     token
Content-Language     ALPHA
Content-Length       DIGIT
Content-Location     absoluteURI | relativeURI
Content-MD5          base64 of 128 bit md5 digest
Content-Range        DIGIT, plus "/" etc
Content-Type         token
Date                 HTTP-date
ETag                 TEXT and CHAR
Expect               token, quoted-string
Expires              HTTP-date
>From                 ASCII (see RFC 822)
Host                 host ":" port
If-Match             TEXT and CHAR
If-Modified-Since    HTTP-date
If-None-Match        TEXT and CHAR
If-Range             TEXT and CHAR | HTTP-date
If-Unmodified-Since  HTTP-date
Last-Modified        HTTP-date
Location             absoluteURI
Max-Forwards         DIGIT
Pragma               token, quoted-string
Proxy-Authenticate   token
Proxy-Authorization  token
Range                token
Referer              absoluteURI | relativeURI
Retry-After          HTTP-date | DIGIT
Server               token, TEXT
TE                   token
Trailer              token
Transfer-Encoding    token
Upgrade              token
User-Agent           token, TEXT
Vary                 token
Via                  token, host, port
Warning              quoted-string, HTTP-date, host, port
WWW-Authenticate     token


The Content-Location, Location, and Referer headers are problematic
since HTTP borrows those from the URI spec, which deals in characters
and not bytes, as you mentioned. Host, and maybe Via, are also special
due to possible IDNA-encoding.

Regarding extension headers, I think we should assume that the HTTP/1.1
spec implies all headers should be token (ASCII) or TEXT (ISO-8859-1).
>From section 4.2:

    field-content  = <the OCTETs making up the field-value
                     and consisting of either *TEXT or combinations
                     of token, separators, and quoted-string>

In addition, the httpbis effort seems to be enforcing this even more
strongly [1]:

     message-header = field-name ":" OWS [ field-value ] OWS
     field-name     = token
     field-value    = *( field-content / OWS )
     field-content  = *( WSP / VCHAR / obs-text )

   Historically, HTTP has allowed field-content with text in the ISO-
   8859-1 [ISO-8859-1] character encoding (allowing other character sets
   through use of [RFC2047] encoding).  In practice, most HTTP header
   field-values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD constrain their field-values to
   US-ASCII characters.  Recipients SHOULD treat other (obs-text) octets
   in field-content as opaque data.

So, from where I sit, we have:

 1. Many header values which are ASCII.
 2. A few header values which are ISO-8859-1 plus RFC 2047.
 3. A few header values which are URI's (no specified encoding) or IRI's
(UTF-8).

I understand the desire to decode ASAP, and I agree with Guido that we
should use a default encoding which the app can override. Looking at the
above, ISO-8859-1 is the best encoding I know of for all three header
cases. ASCII can be used as a valid subset without transcoding; headers
which are ISO-8859-1 are decoded perfectly; URI/IRI headers can be
transcoded by the app if needed, but mangled opaquely by middleware.

If we make *that* call, then IMO there's no reason not to do the same to
SCRIPT_NAME, PATH_INFO, and QUERY_STRING.


Robert Brewer
fumanchu at aminus.org

[1]
http://www.ietf.org/internet-drafts/draft-ietf-httpbis-p1-messaging-06.t
xt


From fumanchu at aminus.org  Fri Apr  3 20:43:32 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Fri, 3 Apr 2009 11:43:32 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com><F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local><ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com><86217.1238608796@parc.com><4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com><91243.1238637653@parc.com><88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407CC64A8@ex10.hostedexchange.local>

Graham Dumpleton wrote:
> I am slowly working through what I think I at least need to do for
> Apache/mod_wsgi. I'll give a summary of what I have worked out so far
> based on the discussions and my own research.
> ...
> Next HTTP header to worry about is HTTP_REFERRER.
> 
> There would be two parts to this, there would be the host name
> component and then the path component.
> 
> We already know from above that for unicode host name it should be the
> IDNA name.
> 
> For the path component, if the client follows the rules properly, then
> if the path uses a non latin-1 encoding, then it should be using RFC
> 2047 to indicate this so shouldn't have to do anything different and
> use same rule as other HTTP headers. For this header we are actually
> in a better situation that for URL in actual HTTP request line which
> isn't so specific about encodings.

I don't think that's true. Referer must be absoluteURI or relativeURI,
neither of which have defined encodings. RFC 2047 only applies to
headers of type TEXT, of which there are surprisingly few.


Robert Brewer
fumanchu at aminus.org


From fumanchu at aminus.org  Fri Apr  3 20:46:04 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Fri, 3 Apr 2009 11:46:04 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com><F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local><ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com><86217.1238608796@parc.com><4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com><91243.1238637653@parc.com><88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com><88e286470904020433l5da48074i8918bddb6f0d67@mail.gmail.com>
	<F06A59E2-1A00-4FA4-9696-48399A0A7ECF@fuhm.net>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407CC64B2@ex10.hostedexchange.local>

James Y Knight wrote:
> On Apr 2, 2009, at 7:33 AM, Graham Dumpleton wrote:
> 
> > """When running under Python 3, servers MUST provide CGI HTTP
> > variables as strings, decoded from the headers using HTTP standard
> > encodings (i.e. latin-1 + RFC 2047)"""
> >
> > Which is fair enough and basically what the RFCs say. At the moment
I
> > don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so
just
> > need to do that.
> 
> I'd really *really* like to recommend that any mention of RFC 2047 is
> stricken from the WSGI server requirements. I cannot imagine that
> decoding actually accomplishing anything other than opening security
> holes (think a filter in an upstream proxy that doesn't know how to do
> 2047-decoding passing something through that you now decode.)
> 
> Also, you have to only do the decoding on TEXT words according to the
> spec, so the WSGI container now needs an HTTP header parser just in
> order to determine where it should decode RFC2047 words and where not
> to? I don't think so...

Something needs to decode RFC2047 words, at least until http-bis is
widespread. I'd be OK with making the app do it as needed (since only it
might know whether extension headers are token/quoted-string/TEXT).


Robert Brewer
fumanchu at aminus.org


From deron.meranda at gmail.com  Fri Apr  3 23:22:11 2009
From: deron.meranda at gmail.com (Deron Meranda)
Date: Fri, 3 Apr 2009 17:22:11 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>
Message-ID: <5c06fa770904031422h2d164081id28651d49230ff3c@mail.gmail.com>

> ... The Request-URI needs to handle IRI's. The headers mostly
> don't--almost all headers are of mostly type "token", which is US-ASCII.
> A few are of type "TEXT", which is ISO-8859-1/RFC 2047. The remaining
> (sub)values are mostly custom byte sequences: ...

Also don't forget about the still-in-draft Link header that is
getting a lot of attention currently (especially at it relates to
resource discovery).

  http://tools.ietf.org/id/draft-nottingham-http-link-header-04.txt

It includes IRIs, along with some other information.
-- 
Deron Meranda

From randy at rcs-comp.com  Sat Apr  4 22:08:27 2009
From: randy at rcs-comp.com (Randy Syring)
Date: Sat, 04 Apr 2009 16:08:27 -0400
Subject: [Web-SIG] Reverse Proxy & HTTPS
Message-ID: <49D7BE3B.5040709@rcs-comp.com>

I have a Python application that I want to run with the CherryPy WSGI 
Server.  My intention is to let the CherryPy server run on a non 
standard port (say 9001) and then let IIS (yes, I know what you are 
thinking, but that is what I have to work with) reverse proxy the 
website requests to CherryPy.

However, I am wondering how I should handle HTTPS.  Currently, there are 
only a few pages in my app that need HTTPS.  When running the app 
natively in IIS, if one of those pages is requested using HTTP, I will 
issue a HTTP header redirect to the HTTPS page.  How should I handle 
this in a reverse proxy situation?  What I mean is, how do I detect in 
my Python app if the original request to IIS is using SSL?  I don't want 
to have to run SSL on the connection from IIS to CherryPy.

I am thinking I could modify the headers to the CherryPy server adding 
something like "X-is-ssl" and then use middleware on the python side to 
set wsgi.url_scheme appropriately.  I just don't know the HTTP standard 
well enough to know how this kind of thing should be handled.

Thank you for any help you can provide.

-- 
--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


From cs at zip.com.au  Sun Apr  5 01:55:06 2009
From: cs at zip.com.au (Cameron Simpson)
Date: Sun, 5 Apr 2009 09:55:06 +1000
Subject: [Web-SIG] Reverse Proxy & HTTPS
In-Reply-To: <49D7BE3B.5040709@rcs-comp.com>
Message-ID: <20090404235506.GA23458@cskk.homeip.net>

On 04Apr2009 16:08, Randy Syring <randy at rcs-comp.com> wrote:
> I have a Python application that I want to run with the CherryPy WSGI  
> Server.  My intention is to let the CherryPy server run on a non  
> standard port (say 9001) and then let IIS (yes, I know what you are  
> thinking, but that is what I have to work with) reverse proxy the  
> website requests to CherryPy.
>
> However, I am wondering how I should handle HTTPS.  Currently, there are  
> only a few pages in my app that need HTTPS.  When running the app  
> natively in IIS, if one of those pages is requested using HTTP, I will  
> issue a HTTP header redirect to the HTTPS page.  How should I handle  
> this in a reverse proxy situation?  What I mean is, how do I detect in  
> my Python app if the original request to IIS is using SSL?  I don't want  
> to have to run SSL on the connection from IIS to CherryPy.
>
> I am thinking I could modify the headers to the CherryPy server adding  
> something like "X-is-ssl" and then use middleware on the python side to  
> set wsgi.url_scheme appropriately.  I just don't know the HTTP standard  
> well enough to know how this kind of thing should be handled.

How tightly knit is the IIS i.e. do you have control over it?  Maybe this
rewrite thing should be set up in IIS instead, it seems the more obvious
place for such control except that the rewrite config would no longer
be "part of the app". At least the IIS server should know if it's http
or https. Or are you wanting to make your CherryPy app robust against
http misuse?

Disclaimer: I know close to nothing about IIS; this is just how I'd be
approaching it with an Apache reverse proxy from end.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

From sh at defuze.org  Mon Apr  6 14:11:30 2009
From: sh at defuze.org (Sylvain Hellegouarch)
Date: Mon, 6 Apr 2009 14:11:30 +0200 (CEST)
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <4a951aa00904020419pe98287ds9443f3bb32c03f27@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<91243.1238637653@parc.com>
	<88e286470904012101q6a4b42fbwc1694594361cc3ea@mail.gmail.com>
	<54714.193.253.216.132.1238655419.squirrel@mail1.webfaction.com>
	<4a951aa00904020419pe98287ds9443f3bb32c03f27@mail.gmail.com>
Message-ID: <62303.193.253.216.132.1239019890.squirrel@mail1.webfaction.com>

Probably of interest in regards to this discussion:

http://lists.w3.org/Archives/Public/ietf-http-wg/2009AprJun/0057.html
http://trac.tools.ietf.org/wg/httpbis/trac/ticket/63

This applies to headers but probably shows that RFC 2047 is gradually
ruled out of HTTP.

- Sylvain
-- 
Sylvain Hellegouarch
http://www.defuze.org

From randy at rcs-comp.com  Mon Apr  6 18:24:42 2009
From: randy at rcs-comp.com (Randy Syring)
Date: Mon, 06 Apr 2009 12:24:42 -0400
Subject: [Web-SIG] Reverse Proxy & HTTPS
In-Reply-To: <20090404235506.GA23458@cskk.homeip.net>
References: <20090404235506.GA23458@cskk.homeip.net>
Message-ID: <49DA2CCA.3030402@rcs-comp.com>

Cameron Simpson wrote:
> On 04Apr2009 16:08, Randy Syring <randy at rcs-comp.com> wrote:
>   
> How tightly knit is the IIS i.e. do you have control over it?  Maybe this
> rewrite thing should be set up in IIS instead, it seems the more obvious
> place for such control except that the rewrite config would no longer
> be "part of the app". At least the IIS server should know if it's http
> or https. Or are you wanting to make your CherryPy app robust against
> http misuse?
>
> Disclaimer: I know close to nothing about IIS; this is just how I'd be
> approaching it with an Apache reverse proxy from end.
>
> Cheers,
>   
Cameron,

Thanks for your reply.  Let me start out by saying that I don't think 
this is an IIS issue, its just that IIS is the front-end web server that 
is proxying the HTTP requests through to the CherryPy server.  If I was 
to choose to run a similar setup on a Linux box with Apache, I still 
think I would have the same question (feel free to correct me if I am 
wrong).

I would like my application to have control over the HTTPS<->HTTP 
redirects and would rather not force that logic into the forward facing 
web server if at all possible.  That just seems like an extra 
configuration step that wouldn't necessarily be needed if I could figure 
out how to pass SSL status from the forward facing web server to the 
backend proxy (i.e. CherryPy and my app).

So, do you (or anyone else) know of a good way to to this?  Or, does 
everyone just assume that it is all or nothing for SSL when you are 
proxying to a backend?

Thank you.

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


From pstradomski at gmail.com  Mon Apr  6 18:32:18 2009
From: pstradomski at gmail.com (=?utf-8?q?Pawe=C5=82_Stradomski?=)
Date: Mon, 6 Apr 2009 18:32:18 +0200
Subject: [Web-SIG] Reverse Proxy & HTTPS
In-Reply-To: <49DA2CCA.3030402@rcs-comp.com>
References: <20090404235506.GA23458@cskk.homeip.net>
	<49DA2CCA.3030402@rcs-comp.com>
Message-ID: <200904061832.18551.pstradomski@gmail.com>

W li?cie Randy Syring z dnia poniedzia?ek, 6 kwietnia 2009:

> I would like my application to have control over the HTTPS<->HTTP
> redirects and would rather not force that logic into the forward facing
> web server if at all possible.  That just seems like an extra
> configuration step that wouldn't necessarily be needed if I could figure
> out how to pass SSL status from the forward facing web server to the
> backend proxy (i.e. CherryPy and my app).
>
> So, do you (or anyone else) know of a good way to to this?  Or, does
> everyone just assume that it is all or nothing for SSL when you are
> proxying to a backend?
>
Check with IIS manual, it should be possible to set some nonstandard header 
when the connection goes through SSL, and then check this header in your 
application. Maybe that header is already there - write a simple controller 
that prints all the headers from the request and check how it looks with and 
without SSL (but verify with the IIS manual anyway).

-- 
Pawe? Stradomski

From graham.dumpleton at gmail.com  Tue Apr  7 00:35:05 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Tue, 7 Apr 2009 08:35:05 +1000
Subject: [Web-SIG] Reverse Proxy & HTTPS
In-Reply-To: <200904061832.18551.pstradomski@gmail.com>
References: <20090404235506.GA23458@cskk.homeip.net>
	<49DA2CCA.3030402@rcs-comp.com>
	<200904061832.18551.pstradomski@gmail.com>
Message-ID: <88e286470904061535h24ceb4f1yd28ca26567349a78@mail.gmail.com>

Using nginx as front end to Apache/mod_wsgi as an example:

On nginx side you would use:

  proxy_set_header X-Url-Scheme $scheme;

and on Apache/mod_wsgi side, with Django 1.0 as an example, in WSGI
script file we would have:

  import os, sys
  sys.path.append('/usr/local/django')

  os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'

  import django.core.handlers.wsgi

  _application = django.core.handlers.wsgi.WSGIHandler()

  def application(environ, start_response):
    environ['wsgi.url_scheme'] = environ.get('HTTP_X_URL_SCHEME', 'http')
    return _application(environ, start_response)

Is the equivalent on IIS side as others have mentioned that you need.

Graham

2009/4/7 Pawe? Stradomski <pstradomski at gmail.com>:
> W li?cie Randy Syring z dnia poniedzia?ek, 6 kwietnia 2009:
>
>> I would like my application to have control over the HTTPS<->HTTP
>> redirects and would rather not force that logic into the forward facing
>> web server if at all possible. ?That just seems like an extra
>> configuration step that wouldn't necessarily be needed if I could figure
>> out how to pass SSL status from the forward facing web server to the
>> backend proxy (i.e. CherryPy and my app).
>>
>> So, do you (or anyone else) know of a good way to to this? ?Or, does
>> everyone just assume that it is all or nothing for SSL when you are
>> proxying to a backend?
>>
> Check with IIS manual, it should be possible to set some nonstandard header
> when the connection goes through SSL, and then check this header in your
> application. Maybe that header is already there - write a simple controller
> that prints all the headers from the request and check how it looks with and
> without SSL (but verify with the IIS manual anyway).
>
> --
> Pawe? Stradomski
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>

From randy at rcs-comp.com  Tue Apr  7 01:01:27 2009
From: randy at rcs-comp.com (Randy Syring)
Date: Mon, 06 Apr 2009 19:01:27 -0400
Subject: [Web-SIG] Reverse Proxy & HTTPS
In-Reply-To: <88e286470904061535h24ceb4f1yd28ca26567349a78@mail.gmail.com>
References: <20090404235506.GA23458@cskk.homeip.net>	<49DA2CCA.3030402@rcs-comp.com>	<200904061832.18551.pstradomski@gmail.com>
	<88e286470904061535h24ceb4f1yd28ca26567349a78@mail.gmail.com>
Message-ID: <49DA89C7.2080809@rcs-comp.com>

Graham,

Excellent, thank you!  That confirms for me the concept is correct, now 
all I have to do is work on an IIS implementation.  FUN!

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


Graham Dumpleton wrote:
> Using nginx as front end to Apache/mod_wsgi as an example:
>
> On nginx side you would use:
>
>   proxy_set_header X-Url-Scheme $scheme;
>
> and on Apache/mod_wsgi side, with Django 1.0 as an example, in WSGI
> script file we would have:
>
>   import os, sys
>   sys.path.append('/usr/local/django')
>
>   os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
>
>   import django.core.handlers.wsgi
>
>   _application = django.core.handlers.wsgi.WSGIHandler()
>
>   def application(environ, start_response):
>     environ['wsgi.url_scheme'] = environ.get('HTTP_X_URL_SCHEME', 'http')
>     return _application(environ, start_response)
>
> Is the equivalent on IIS side as others have mentioned that you need.
>
> Graham
>
> 2009/4/7 Pawe? Stradomski <pstradomski at gmail.com>:
>   
>> W li?cie Randy Syring z dnia poniedzia?ek, 6 kwietnia 2009:
>>
>>     
>>> I would like my application to have control over the HTTPS<->HTTP
>>> redirects and would rather not force that logic into the forward facing
>>> web server if at all possible.  That just seems like an extra
>>> configuration step that wouldn't necessarily be needed if I could figure
>>> out how to pass SSL status from the forward facing web server to the
>>> backend proxy (i.e. CherryPy and my app).
>>>
>>> So, do you (or anyone else) know of a good way to to this?  Or, does
>>> everyone just assume that it is all or nothing for SSL when you are
>>> proxying to a backend?
>>>
>>>       
>> Check with IIS manual, it should be possible to set some nonstandard header
>> when the connection goes through SSL, and then check this header in your
>> application. Maybe that header is already there - write a simple controller
>> that prints all the headers from the request and check how it looks with and
>> without SSL (but verify with the IIS manual anyway).
>>
>> --
>> Pawe? Stradomski
>> _______________________________________________
>> Web-SIG mailing list
>> Web-SIG at python.org
>> Web SIG: http://www.python.org/sigs/web-sig
>> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>>
>>     
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090406/432bcc5d/attachment-0001.htm>

From ianb at colorstudy.com  Tue Apr  7 01:32:20 2009
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 6 Apr 2009 18:32:20 -0500
Subject: [Web-SIG] Reverse Proxy & HTTPS
In-Reply-To: <49DA89C7.2080809@rcs-comp.com>
References: <20090404235506.GA23458@cskk.homeip.net>
	<49DA2CCA.3030402@rcs-comp.com> 
	<200904061832.18551.pstradomski@gmail.com>
	<88e286470904061535h24ceb4f1yd28ca26567349a78@mail.gmail.com> 
	<49DA89C7.2080809@rcs-comp.com>
Message-ID: <b654cd2e0904061632x6e6a1a7fu5e46680ac7fa9961@mail.gmail.com>

A last note: paste.deploy.config.PrefixMiddleware does some fixup for cases
like this, including looking at X-Forwarded-Scheme and X-Forwarded-Proto for
the protocol (both names, because there's nothing approaching consensus on
what to name these headers).


2009/4/6 Randy Syring <randy at rcs-comp.com>

>  Graham,
>
> Excellent, thank you!  That confirms for me the concept is correct, now all
> I have to do is work on an IIS implementation.  FUN!
>
> --------------------------------------
> Randy Syring
> RCS Computers & Web Solutions
> 502-644-4776http://www.rcs-comp.com
>
> "Whether, then, you eat or drink or
> whatever you do, do all to the glory
> of God." 1 Cor 10:31
>
>
>
> Graham Dumpleton wrote:
>
> Using nginx as front end to Apache/mod_wsgi as an example:
>
> On nginx side you would use:
>
>   proxy_set_header X-Url-Scheme $scheme;
>
> and on Apache/mod_wsgi side, with Django 1.0 as an example, in WSGI
> script file we would have:
>
>   import os, sys
>   sys.path.append('/usr/local/django')
>
>   os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
>
>   import django.core.handlers.wsgi
>
>   _application = django.core.handlers.wsgi.WSGIHandler()
>
>   def application(environ, start_response):
>     environ['wsgi.url_scheme'] = environ.get('HTTP_X_URL_SCHEME', 'http')
>     return _application(environ, start_response)
>
> Is the equivalent on IIS side as others have mentioned that you need.
>
> Graham
>
> 2009/4/7 Pawe? Stradomski <pstradomski at gmail.com> <pstradomski at gmail.com>:
>
>
>  W li?cie Randy Syring z dnia poniedzia?ek, 6 kwietnia 2009:
>
>
>
>  I would like my application to have control over the HTTPS<->HTTP
> redirects and would rather not force that logic into the forward facing
> web server if at all possible.  That just seems like an extra
> configuration step that wouldn't necessarily be needed if I could figure
> out how to pass SSL status from the forward facing web server to the
> backend proxy (i.e. CherryPy and my app).
>
> So, do you (or anyone else) know of a good way to to this?  Or, does
> everyone just assume that it is all or nothing for SSL when you are
> proxying to a backend?
>
>
>
>  Check with IIS manual, it should be possible to set some nonstandard header
> when the connection goes through SSL, and then check this header in your
> application. Maybe that header is already there - write a simple controller
> that prints all the headers from the request and check how it looks with and
> without SSL (but verify with the IIS manual anyway).
>
> --
> Pawe? Stradomski
> _______________________________________________
> Web-SIG mailing listWeb-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>
>      _______________________________________________
> Web-SIG mailing listWeb-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
>
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com
>
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/ianb%40colorstudy.com
>
>


-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090406/53496850/attachment.htm>

From arw1961 at yahoo.com  Tue Apr  7 15:13:39 2009
From: arw1961 at yahoo.com (Aaron Watters)
Date: Tue, 7 Apr 2009 06:13:39 -0700 (PDT)
Subject: [Web-SIG] Please look at WHIFF -- WSGI/HTTP INTEGRATED FILESYSTEM
	FRAMES
Message-ID: <882586.14227.qm@web32003.mail.mud.yahoo.com>


Hi folks,

I tried this announcement on some easy
going lists yesterday and no one has
taken me to the woodshed yet, so I thought
I'd have a go at a tougher crowd.

I'm releasing a WSGI component suite called
WHIFF and I'd just love it if you folks
would have a look and comment/suggest/criticize/complain.
If you'd like to try it out -- even better.

Please go

http://whiff.sourceforge.net

Or use one of the links in the announcement
below.

  Thanks -- Aaron Watters

===
THIS .SIG IS INTENTIONALLY LEFT BLANK
===


WHIFF -- WSGI/HTTP INTEGRATED FILESYSTEM FRAMES

WHIFF is an infrastructure for easily building 
complex Python/WSGI Web applications by combining 
smaller and simpler WSGI components organized 
within file system trees.

To DOWNLOAD WHIFF go to the WHIFF project 
information page at 

http://sourceforge.net/projects/whiff 

and follow the download instructions.

To GET THE LATEST WHIFF clone the 
WHIFF Mercurial repository located at 

http://aaron.oirt.rutgers.edu/cgi-bin/whiffRepo.cgi.

To READ ABOUT WHIFF view the WHIFF documentation at 

http://aaron.oirt.rutgers.edu/myapp/docs/W.intro.

To PLAY WITH WHIFF try the demos listed in the demos page at 

http://aaron.oirt.rutgers.edu/myapp/docs/W1300.testAndDemo.

Why WHIFF?
==========
WHIFF (WSGI HTTP Integrated Filesystem Frames) 
is intended to make it easier to create, deploy, 
and maintain large and complex Python based WSGI 
Web applications. I created WHIFF to address 
complexity issues I encounter when creating and 
fixing sophisticated Web applications which 
include complex database interactions 
and dynamic features such as AJAX 
(Asynchronous JavaScript and XML).

The primary tools which reduce complexity are 
an infrastructure for managing web application 
name spaces, a configuration template language 
for wiring named components into an application, 
and an applications programmer interface for 
accessing named components from Python and 
javascript modules.

All supporting conventions and tools offered by 
WHIFF are optional. WHIFF is designed to work well 
with other modules conformant to the 
WSGI (Web Service Gateway Interface) standard. 
Developers and designers are free to use those 
WHIFF tools that work for them and ignore or 
replace the others.

WHIFF does not provide a "packaged cake mix" 
for baking a web application. 
Instead WHIFF is designed to provide a set of 
ingredients which can be easily combined to make 
web applications (with no need to refine your own 
sugar or mill your own wheat). 

I hope you like it.  -- Aaron Watters


From brian at briansmith.org  Wed Apr  8 05:30:55 2009
From: brian at briansmith.org (Brian Smith)
Date: Tue, 7 Apr 2009 22:30:55 -0500
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
Message-ID: <00cc01c9b7fa$6dd82a80$49887f80$@org>

Here is the change that removes the use of RFC 2047 from HTTP in HTTPbis.

-----Original Message-----
From: ietf-http-wg-request at w3.org [mailto:ietf-http-wg-request at w3.org] On
Behalf Of Mark Nottingham
Sent: Monday, April 06, 2009 5:00
To: HTTP Working Group
Subject: Closing #63: RFC2047 encoded words

The editors believe that issue #63 has been addressed by the changes  
in the -06 drafts.

Specifically, RFC2047 encoding is no longer suggested as the default  
encoding for non-ASCII characters; rather, it is left up to specific  
header definitions to specify.


From fumanchu at aminus.org  Wed Apr  8 18:57:43 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Wed, 8 Apr 2009 09:57:43 -0700
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
In-Reply-To: <00cc01c9b7fa$6dd82a80$49887f80$@org>
References: <00cc01c9b7fa$6dd82a80$49887f80$@org>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6407DC9CE8@ex10.hostedexchange.local>

Brian Smith wrote:
> Here is the change that removes the use of RFC 2047 from HTTP in
> HTTPbis.

Yes, but parsers need to continue decoding them for many years to come.
IMO WSGI origin servers should do this so we can write the decoding
logic once and forget about it (assuming middleware and apps far
outnumber origin servers).


Robert Brewer
fumanchu at aminus.org


From foom at fuhm.net  Wed Apr  8 20:14:10 2009
From: foom at fuhm.net (James Y Knight)
Date: Wed, 8 Apr 2009 14:14:10 -0400
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6407DC9CE8@ex10.hostedexchange.local>
References: <00cc01c9b7fa$6dd82a80$49887f80$@org>
	<F1962646D3B64642B7C9A06068EE1E6407DC9CE8@ex10.hostedexchange.local>
Message-ID: <8389CCA8-8ABD-49A0-AEB8-11F26083DBA5@fuhm.net>

On Apr 8, 2009, at 12:57 PM, Robert Brewer wrote:
> Yes, but parsers need to continue decoding them for many years to  
> come.
> IMO WSGI origin servers should do this so we can write the decoding
> logic once and forget about it (assuming middleware and apps far
> outnumber origin servers).

Decoding RFC 2047 quoted words is rather trivial compared to correctly  
parsing all the HTTP headers. Plus, as I said before, you can't even  
*do* the RFC2047 decoding without parsing the headers at the same time  
to figure out which pieces need to be decoded! And furthermore, nobody  
needs to "continue" decoding them for years to come, *because nobody  
decodes them now*!

WSGI is intentionally exposing a fairly low-level view of the world.  
So my opinion is that the headers in the dict should be byte strings  
and that anyone who wants decoded headers also probably really wants  
(or ought to want!) parsed headers, and thus should be using an http  
header parsing library. That can expose values as unicode strings if  
it wants to.

If you want to start a discussion about having a standard parsed- 
header object in WSGI, that's another thing, but saying that WSGI  
servers should *partially* decode the headers seems rather silly to me.

James


From brian at briansmith.org  Wed Apr  8 20:20:28 2009
From: brian at briansmith.org (Brian Smith)
Date: Wed, 8 Apr 2009 13:20:28 -0500
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
Message-ID: <000801c9b876$b7423130$25c69390$@org>

Robert Brewer wrote:
> Brian Smith wrote:
> > Here is the change that removes the use of RFC 2047 from HTTP in 
> > HTTPbis.
> 
> Yes, but parsers need to continue decoding them for many years to come.
> IMO WSGI origin servers should do this so we can write the decoding 
> logic once and forget about it (assuming middleware and apps far 
> outnumber origin servers).

No, it really is better for WSGI implementations to completely avoid RFC
2047. None of the HTTP specifications ever specified how RFC 2047 was to be
used in HTTP. RFC 2616 vaguely suggested the use of RFC 2047 but it was
never integrated into any part of the grammar. In the long discussions on
this topic in the HTTP working group, nobody ever presented a real-life
example where RFC 2047 encoding has actually been used. The hypothetical
examples that were presented in the discussion were found to violate RFC
2047 and/or other parts of the HTTP specification. Nobody ever presented an
example (even hypothetical) using RFC 2047 encoding that the working group
agreed was valid.


From ianb at colorstudy.com  Wed Apr  8 21:01:39 2009
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed, 8 Apr 2009 14:01:39 -0500
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
In-Reply-To: <8389CCA8-8ABD-49A0-AEB8-11F26083DBA5@fuhm.net>
References: <00cc01c9b7fa$6dd82a80$49887f80$@org>
	<F1962646D3B64642B7C9A06068EE1E6407DC9CE8@ex10.hostedexchange.local>
	<8389CCA8-8ABD-49A0-AEB8-11F26083DBA5@fuhm.net>
Message-ID: <b654cd2e0904081201m75600378mda7ad749ad6899b9@mail.gmail.com>

On Wed, Apr 8, 2009 at 1:14 PM, James Y Knight <foom at fuhm.net> wrote:

> If you want to start a discussion about having a standard parsed-header
> object in WSGI, that's another thing,


Off topic to this discussion, but that's what WebOb is.  It also largely
handles the encoding issues, abstracts away the awkwardness of the WSGI call
signature, and also does header parsing.

-- 
Ian Bicking  |  http://blog.ianbicking.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090408/fc13a5b6/attachment.htm>

From alan at xhaus.com  Thu Apr  9 01:58:54 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Thu, 9 Apr 2009 00:58:54 +0100
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
In-Reply-To: <8389CCA8-8ABD-49A0-AEB8-11F26083DBA5@fuhm.net>
References: <00cc01c9b7fa$6dd82a80$49887f80$@org>
	<F1962646D3B64642B7C9A06068EE1E6407DC9CE8@ex10.hostedexchange.local>
	<8389CCA8-8ABD-49A0-AEB8-11F26083DBA5@fuhm.net>
Message-ID: <4a951aa00904081658v66892850wce13e5a8093b38c6@mail.gmail.com>

[James]
> If you want to start a discussion about having a standard parsed-header
> object in WSGI, that's another thing, but saying that WSGI servers should
> *partially* decode the headers seems rather silly to me.

Hi James,

It's a shame that your proposal to add the twisted header parsing
library to the standard library didn't catch on years ago.

http://mail.python.org/pipermail/web-sig/2006-February/002119.html

Alan.

From alan at xhaus.com  Thu Apr  9 01:59:11 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Thu, 9 Apr 2009 00:59:11 +0100
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
In-Reply-To: <00cc01c9b7fa$6dd82a80$49887f80$@org>
References: <00cc01c9b7fa$6dd82a80$49887f80$@org>
Message-ID: <4a951aa00904081659k31e6464co7480474fd62d30ab@mail.gmail.com>

[Brian]
> Here is the change that removes the use of RFC 2047 from HTTP in HTTPbis.

Grand so; all we need to do is to wait for everyone to stop using
HTTP/1.1, start using HTTP/bis, and our problems are at an end!

;-)

Alan.

From brian at briansmith.org  Thu Apr  9 04:36:22 2009
From: brian at briansmith.org (Brian Smith)
Date: Wed, 8 Apr 2009 21:36:22 -0500
Subject: [Web-SIG] FW: Closing #63: RFC2047 encoded words
In-Reply-To: <4a951aa00904081659k31e6464co7480474fd62d30ab@mail.gmail.com>
References: <00cc01c9b7fa$6dd82a80$49887f80$@org>
	<4a951aa00904081659k31e6464co7480474fd62d30ab@mail.gmail.com>
Message-ID: <001e01c9b8bc$01bdfb00$0539f100$@org>

Alan Kennedy wrote:
> [Brian]
> > Here is the change that removes the use of RFC 2047 from HTTP in
> HTTPbis.
> 
> Grand so; all we need to do is to wait for everyone to stop using
> HTTP/1.1, start using HTTP/bis, and our problems are at an end!

HTTPbis *is* (will be) HTTP/1.1. It doesn't define a new version of the
protocol. RFC 2616 has many mistakes that make it a poor description of
HTTP/1.1 and the purpose of HTTPbis is to fix those mistakes. That is a
little bit of an over-simplification.

Try to create a RFC2616-compliant message that uses RFC 2047 encoding. It
can't be done because RFC 2047 was never integrated into the RFC 2616
grammar. That is why HTTPbis removed the vague reference to RFC 2047 from
the prose. If RFC 2616 provided a way of using RFC 2047 in HTTP messages
then HTTPbis would still allow it but recommend that implementations SHOULD
NOT use it (similar to how line-folding is deprecated but still allowed in
HTTPbis).

- Brian


From pfein at pobox.com  Fri Apr 10 23:12:51 2009
From: pfein at pobox.com (Pete)
Date: Fri, 10 Apr 2009 16:12:51 -0500
Subject: [Web-SIG] RESTful Python email list?
Message-ID: <FF26E3F2-6CA8-4A2E-8C5C-77CA8EABC80D@pobox.com>

This came up at the REST BoF at Pycon...

Any interest in a dedicated email list for REST + python, a la the  
restful-json group [0]?  The group would discuss strategies for REST  
architecture built with and within Python.  WSGI 1.0 vs. 2.0 vs. 2e6  
is out of scope. ;-)

--Pete

[0] - http://groups.google.com/group/restful-json

From alan at xhaus.com  Sat Apr 11 15:05:16 2009
From: alan at xhaus.com (Alan Kennedy)
Date: Sat, 11 Apr 2009 14:05:16 +0100
Subject: [Web-SIG] RESTful Python email list?
In-Reply-To: <FF26E3F2-6CA8-4A2E-8C5C-77CA8EABC80D@pobox.com>
References: <FF26E3F2-6CA8-4A2E-8C5C-77CA8EABC80D@pobox.com>
Message-ID: <4a951aa00904110605o625554d9x61ed39420e523825@mail.gmail.com>

[Pete]
> Any interest in a dedicated email list for REST + python, a la the
> restful-json group [0]? ?The group would discuss strategies for REST
> architecture built with and within Python. ?WSGI 1.0 vs. 2.0 vs. 2e6 is out
> of scope. ;-)

Just a thought: is there any reason why RESTful python discussions
cannot take place on the restful-json group referred to?

Alan.

From jim at zope.com  Sat Apr 11 16:01:45 2009
From: jim at zope.com (Jim Fulton)
Date: Sat, 11 Apr 2009 10:01:45 -0400
Subject: [Web-SIG] RESTful Python email list?
In-Reply-To: <FF26E3F2-6CA8-4A2E-8C5C-77CA8EABC80D@pobox.com>
References: <FF26E3F2-6CA8-4A2E-8C5C-77CA8EABC80D@pobox.com>
Message-ID: <CC47C068-794D-4697-BB53-2F93D9A984B4@zope.com>


On Apr 10, 2009, at 5:12 PM, Pete wrote:

> This came up at the REST BoF at Pycon...
>
> Any interest in a dedicated email list for REST + python, a la the  
> restful-json group [0]?  The group would discuss strategies for REST  
> architecture built with and within Python.  WSGI 1.0 vs. 2.0 vs. 2e6  
> is out of scope. ;-)


-1

I'd be happy to see the discussions here.

Jim

--
Jim Fulton
Zope Corporation


From milesck at umich.edu  Sun Apr 12 02:48:59 2009
From: milesck at umich.edu (Miles Kaufmann)
Date: Sat, 11 Apr 2009 20:48:59 -0400
Subject: [Web-SIG] Python 3: Form data encoding issues in cgi and urllib
	modules
Message-ID: <5ec9495f0904111748p49ad255bib898d41e05e57d3d@mail.gmail.com>

Hi everyone,

I read through the recent archives, and I've seen some discussion on
similar topics, but not this exact topic recently, so if the solution
to these issues has already been decided, please point me to the
relevant messages.  (Also, if this isn't the most appropriate list,
please let me know!)

The first issue is that there doesn't seem to be a way to parse
x-www-form-urlencoded query strings in a character set other than
UTF-8, for example:

'premier=un&deuxi%E8me=deux' # latin-1

The urllib.parse.unquote* functions take encoding and errors
parameters, but none of the higher-level ones.  The solution to me
seems to be that functions that build on top of
it--urllib.parse.parse*, cgi.parse*, and the cgi.FieldStorage
constructor--should grow encoding and errors parameters that they pass
through to the lower-level functions.

The second issue is that the FieldStorage classes work with text input
streams.  However, with multipart/form-data posts, posted files aren't
necessarily in the same encoding as form fields, or may be binary and
not text at all.  I would suggest that FieldStorage should be changed
to take a binary input stream. For multipart forms, it should only
attempt to decode a part with the passed-in FieldStorage encoding if
the part's content type is text/plain and the content-disposition does
not specify a filename; otherwise, field.file would be a binary file,
and field.value should be bytes or non-existent.

Here is a example form submission that is currently difficult to
handle with the cgi module, posted from a page with a charset of UTF-8
and two attached files; this is similar to how a real form submission
from Safari or Firefox would look:

post_input = b"""---123
Content-Disposition: form-data; name="utf8text"

\xc2\xa1ol\xc3\xa9!
---123
Content-Disposition: form-data; name="file1"; filename="latin1.txt"
Content-Type: text/plain

Oh l\xe0 l\xe0!
---123
Content-Disposition: form-data; name="file2"; filename="binary"
Content-Type: application/octet-stream

\x80\x81\x82\x83\x84\x85\x86\x87\xad\xf0
---123--
"""

environ = {'CONTENT_LENGTH':str(len(post_input)),
    'CONTENT_TYPE': 'multipart/form-data; boundary=-123',
    'REQUEST_METHOD': 'POST'}

It's possible that the email.mime and http packages might also need
some changes made, but I haven't looked into those as much.  Also,
cgi.parse_multipart seems to be broken currently, since it uses
http.client.parse_headers which expects a bytes stream.

If there's agreement on these points, I think it would be important to
get these changes (or perhaps alternate fixes) into Python 3.1; I know
that some of the changes are backwards incompatible with 3.0, but I
think that the encoding issues in the current cgi module make it very
difficult to work with.  I'm willing to take responsibility for
submitting bug reports and patches, but could probably use a more
experienced mentor to let me know if I'm doing it wrong.

If you don't think that these changes are reasonable, I'm interested
to hear your alternate suggestions.  I strongly believe that the
current behavior is broken and needs to be changed for 3.1.

Thanks for your consideration,
Miles Kaufmann

From milesck at umich.edu  Sun Apr 12 03:41:51 2009
From: milesck at umich.edu (Miles Kaufmann)
Date: Sat, 11 Apr 2009 21:41:51 -0400
Subject: [Web-SIG] Python 3: Form data encoding issues in cgi and urllib
	modules
In-Reply-To: <5ec9495f0904111748p49ad255bib898d41e05e57d3d@mail.gmail.com>
References: <5ec9495f0904111748p49ad255bib898d41e05e57d3d@mail.gmail.com>
Message-ID: <5ec9495f0904111841k4075c17cl6bd5ef1dbc17595a@mail.gmail.com>

On Sat, Apr 11, 2009 at 8:48 PM, Miles Kaufmann wrote:
> ...
> It's possible that the email.mime and http packages might also need
> some changes made, but I haven't looked into those as much.
> ...

Apparently there's been some discussion on the python-dev and
email-sig lists in the past couple of days since I last checked, about
the email package and strings and bytes.  So it might be the case that
the cgi module will build on top of those decisions.  But I want to
make sure that the cgi module isn't left behind, and I think that
having FieldStorage being built from string streams instead of byte
streams is a mistake that should be rectified ASAP.

On Fri, Apr 10, 2009 at 12:35 PM, Bill Janssen wrote [1]:
> Barry Warsaw <barry at python.org> wrote:
>> In that case, we really need the
>> bytes-in-bytes-out-bytes-in-the-chewy-
>> center API first, and build things on top of that.
>
> Yep.

-Miles Kaufmann

[1] http://mail.python.org/pipermail/email-sig/2009-April/000438.html

From ogbujic at ccf.org  Mon Apr 13 16:40:27 2009
From: ogbujic at ccf.org (Chimezie Ogbuji)
Date: Mon, 13 Apr 2009 10:40:27 -0400
Subject: [Web-SIG] Closing long-running WSGI requests (possible?)
Message-ID: <C608C71B.99F3%ogbujic@ccf.org>

Hello.  I have a problem with a WSGI-based SPARQL server that I have been
unable to resolve for some time.  I was told this is the best place to ask
:).  I'm building a SPARQL [1] server that is deployed as  WSGI/Paste
server.  SPARQL queries are handled by the server and evaluated against a
MySQL database using mysql-python/MySQLdb to manage the connection.

My goal is to be able to allow clients to close the connection in order to
kill queries that have been dispatched (in order to 'abort' them).
Unfortunately, when the client kills the connection, the application is not
signaled in any way.  So, the result is that (for long-running queries), the
MySQL query continues to run even after the connection is closed (by
clicking cancel in the browser for instance).

I would expect that when the connection is closed at the client side, this
should trigger a chain reaction of garbage collection (deletion of the
application object, and all the objects attributed to it including the DB
connection, etc.) that bottoms out in the db connection closing and MySQLdb
killing the query as a side effect of calling __del__ on the cursor and
database connection.  However, this is not what is happening and it appears
that the once the result is served back to the client, the server and the
client are completely 'disconnected' for that particular request.

Am I going about his the wrong way? Does WSGI simply not have anything to
say about such a situation ? If the problem isn't
WSGI, is there another WSGI implementation that is known to behave as
expected (i.e., closing the connection dispatches the deletion of the
objects involved in the request handling)?

I was told to look into keep-alive, but the specification doesn't seem to
suggest that this would help me as it has more to do with re-using
connections for subsequent requests rather than specifying that the server
maintains a connection between the request and the objects involved in
handling the request at the server.

Any help would be greatly appreciated.

Thanks

[1] http://www.w3.org/TR/rdf-sparql-query/


===================================

P Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S. News & World Report (2008).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.


From christian at dowski.com  Mon Apr 13 16:53:10 2009
From: christian at dowski.com (Christian Wyglendowski)
Date: Mon, 13 Apr 2009 10:53:10 -0400
Subject: [Web-SIG] Closing long-running WSGI requests (possible?)
In-Reply-To: <C608C71B.99F3%ogbujic@ccf.org>
References: <C608C71B.99F3%ogbujic@ccf.org>
Message-ID: <d66b7a7f0904130753s50ca6995r1a07db111f4d2d4a@mail.gmail.com>

On Mon, Apr 13, 2009 at 10:40 AM, Chimezie Ogbuji <ogbujic at ccf.org> wrote:
> Hello.  I have a problem with a WSGI-based SPARQL server that I have been
> unable to resolve for some time.  I was told this is the best place to ask
> :).  I'm building a SPARQL [1] server that is deployed as  WSGI/Paste
> server.  SPARQL queries are handled by the server and evaluated against a
> MySQL database using mysql-python/MySQLdb to manage the connection.
>
> My goal is to be able to allow clients to close the connection in order to
> kill queries that have been dispatched (in order to 'abort' them).

This should be doable from what I understand.  From PEP 333:

"If the iterable returned by the application has a close() method, the
server or gateway must call that method upon completion of the current
request, whether the request was completed normally, or terminated
early due to an error. (This is to support resource release by the
application. This protocol is intended to complement PEP 325's
generator support, and other common iterables with close() methods."
[1]

So it sounds like you could add a close method on whatever iterable
that your application returns and have it do the required resource
release there.

HTH,

Christian
http://www.dowski.com

[1] http://www.python.org/dev/peps/pep-0333/#specification-details

From ionel.mc at gmail.com  Mon Apr 13 18:01:09 2009
From: ionel.mc at gmail.com (Ionel Maries Cristian)
Date: Mon, 13 Apr 2009 19:01:09 +0300
Subject: [Web-SIG] Closing long-running WSGI requests (possible?)
In-Reply-To: <d66b7a7f0904130753s50ca6995r1a07db111f4d2d4a@mail.gmail.com>
References: <C608C71B.99F3%ogbujic@ccf.org>
	<d66b7a7f0904130753s50ca6995r1a07db111f4d2d4a@mail.gmail.com>
Message-ID: <b322b4e60904130901j474162d4h2c919b3627eefc85@mail.gmail.com>

That implies one would have extremely reliable tcp connections, and clients
graciously shutdown the connection and the server is notified of that.

Most of the time that doesn't happen and the solution is to continuously
send
keepalive packets (some small string or whatever) - I'm assuming you run
a batch a set of queries and you can interleave yielding some data while
you run that batch.

For example if your client disconnects and the servers tries to send some
data
it would fail - and trigger closing the app iterable.

In contrast a server that just runs some backend processing without moving
any data around doesn't have any way to know if the connection is still
valid.

Then again, even if the client properly shutdown the connection the server
won't do anything about it if it doesn't try to do anything with the socket
due
to the synchronous nature (I'm assuming) of the whole server/app.

-- ionel


On Mon, Apr 13, 2009 at 17:53, Christian Wyglendowski
<christian at dowski.com>wrote:

> On Mon, Apr 13, 2009 at 10:40 AM, Chimezie Ogbuji <ogbujic at ccf.org> wrote:
> > Hello.  I have a problem with a WSGI-based SPARQL server that I have been
> > unable to resolve for some time.  I was told this is the best place to
> ask
> > :).  I'm building a SPARQL [1] server that is deployed as  WSGI/Paste
> > server.  SPARQL queries are handled by the server and evaluated against a
> > MySQL database using mysql-python/MySQLdb to manage the connection.
> >
> > My goal is to be able to allow clients to close the connection in order
> to
> > kill queries that have been dispatched (in order to 'abort' them).
>
> This should be doable from what I understand.  From PEP 333:
>
> "If the iterable returned by the application has a close() method, the
> server or gateway must call that method upon completion of the current
> request, whether the request was completed normally, or terminated
> early due to an error. (This is to support resource release by the
> application. This protocol is intended to complement PEP 325's
> generator support, and other common iterables with close() methods."
> [1]
>
> So it sounds like you could add a close method on whatever iterable
> that your application returns and have it do the required resource
> release there.
>
> HTH,
>
> Christian
> http://www.dowski.com
>
> [1] http://www.python.org/dev/peps/pep-0333/#specification-details
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090413/c2f26f80/attachment.htm>

From arw1961 at yahoo.com  Mon Apr 13 20:13:05 2009
From: arw1961 at yahoo.com (Aaron Watters)
Date: Mon, 13 Apr 2009 11:13:05 -0700 (PDT)
Subject: [Web-SIG] Closing long-running WSGI requests (possible?)
Message-ID: <734289.37918.qm@web32004.mail.mud.yahoo.com>


I agree with Ionel

I personally wouldn't rely on "kill wsgi request".
I'd run the update in a subprocess and kill the subprocess
using a signal when the user requests (on unix, of course).
I'd also check a log written by the subprocess to see
whether it completed or not.

If you "kill the wsgi request" you have the problem
of not being quite sure whether the kill arrived in time,
among other possible difficulties, some mentioned by Ionel.

  -- Aaron Watters
     http://aaron.oirt.rutgers.edu/myapp/docs/W0500.quickstart

(apologies to Christian, who got this twice, I forgot to "reply all")


--- On Mon, 4/13/09, Ionel Maries Cristian <ionel.mc at gmail.com> wrote:

> From: Ionel Maries Cristian <ionel.mc at gmail.com>
> Subject: Re: [Web-SIG] Closing long-running WSGI requests (possible?)
> To: "Christian Wyglendowski" <christian at dowski.com>
> Cc: "Chimezie Ogbuji" <ogbujic at ccf.org>, web-sig at python.org
> Date: Monday, April 13, 2009, 12:01 PM
> That implies one would have extremely
> reliable tcp connections, and clients
> graciously shutdown the connection and the server is
> notified of that.
> 
> Most of the time that doesn't happen and the solution
> is to continuously send 
> 
> keepalive packets (some small string or whatever) - I'm
> assuming you run
> a batch a set of queries and you can interleave yielding
> some data while
> you run that batch.
> 
> For example if your client disconnects and the servers
> tries to send some data
> 
> it would fail - and trigger closing the app iterable.
> 
> In contrast a server that just runs some backend processing
> without moving 
> any data around doesn't have any way to know if the
> connection is still valid.
> 
> 
> Then again, even if the client properly shutdown the
> connection the server
> won't do anything about it if it doesn't try to do
> anything with the socket due
> to the synchronous nature (I'm assuming) of the whole
> server/app.
> 
> 
> -- ionel
> 
> 
> 
> 
> On Mon, Apr 13, 2009 at 17:53,
> Christian Wyglendowski <christian at dowski.com>
> wrote:
> 
> On Mon, Apr 13, 2009 at 10:40 AM, Chimezie
> Ogbuji <ogbujic at ccf.org>
> wrote:
> 
> > Hello.  I have a problem with a WSGI-based SPARQL
> server that I have been
> 
> > unable to resolve for some time.  I was told this is
> the best place to ask
> 
> > :).  I'm building a SPARQL [1] server that is
> deployed as  WSGI/Paste
> 
> > server.  SPARQL queries are handled by the server and
> evaluated against a
> 
> > MySQL database using mysql-python/MySQLdb to manage
> the connection.
> 
> >
> 
> > My goal is to be able to allow clients to close the
> connection in order to
> 
> > kill queries that have been dispatched (in order to
> 'abort' them).
> 
> 
> 
> This should be doable from what I understand.  From
> PEP 333:
> 
> 
> 
> "If the iterable returned by the application has a
> close() method, the
> 
> server or gateway must call that method upon completion of
> the current
> 
> request, whether the request was completed normally, or
> terminated
> 
> early due to an error. (This is to support resource release
> by the
> 
> application. This protocol is intended to complement PEP
> 325's
> 
> generator support, and other common iterables with close()
> methods."
> 
> [1]
> 
> 
> 
> So it sounds like you could add a close method on whatever
> iterable
> 
> that your application returns and have it do the required
> resource
> 
> release there.
> 
> 
> 
> HTH,
> 
> 
> 
> Christian
> 
> http://www.dowski.com
> 
> 
> 
> [1] http://www.python.org/dev/peps/pep-0333/#specification-details
> 
> _______________________________________________
> 
> Web-SIG mailing list
> 
> Web-SIG at python.org
> 
> Web SIG: http://www.python.org/sigs/web-sig
> 
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/ionel.mc%40gmail.com
> 
> 
> 
> 
> -----Inline Attachment Follows-----
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/arw1961%40yahoo.com
> 

From graham.dumpleton at gmail.com  Mon Apr 13 22:58:22 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Tue, 14 Apr 2009 06:58:22 +1000
Subject: [Web-SIG] Closing long-running WSGI requests (possible?)
In-Reply-To: <C608C71B.99F3%ogbujic@ccf.org>
References: <Acm8RckixRTBJNuLEkGErXO3cEw/cQ==> <C608C71B.99F3%ogbujic@ccf.org>
Message-ID: <88e286470904131358t1b9c8ab5we9fed25e656ae414@mail.gmail.com>

No, cannot really be done. This has been discussed a couple of times
on the mod_wsgi list. One such discussion is at:

  http://groups.google.com/group/modwsgi/browse_frm/thread/8ebd9aca9d317ac9

In general the same issues apply to all WSGI implementations.

Graham

2009/4/14 Chimezie Ogbuji <ogbujic at ccf.org>:
> Hello. ?I have a problem with a WSGI-based SPARQL server that I have been
> unable to resolve for some time. ?I was told this is the best place to ask
> :). ?I'm building a SPARQL [1] server that is deployed as ?WSGI/Paste
> server. ?SPARQL queries are handled by the server and evaluated against a
> MySQL database using mysql-python/MySQLdb to manage the connection.
>
> My goal is to be able to allow clients to close the connection in order to
> kill queries that have been dispatched (in order to 'abort' them).
> Unfortunately, when the client kills the connection, the application is not
> signaled in any way. ?So, the result is that (for long-running queries), the
> MySQL query continues to run even after the connection is closed (by
> clicking cancel in the browser for instance).
>
> I would expect that when the connection is closed at the client side, this
> should trigger a chain reaction of garbage collection (deletion of the
> application object, and all the objects attributed to it including the DB
> connection, etc.) that bottoms out in the db connection closing and MySQLdb
> killing the query as a side effect of calling __del__ on the cursor and
> database connection. ?However, this is not what is happening and it appears
> that the once the result is served back to the client, the server and the
> client are completely 'disconnected' for that particular request.
>
> Am I going about his the wrong way? Does WSGI simply not have anything to
> say about such a situation ? If the problem isn't
> WSGI, is there another WSGI implementation that is known to behave as
> expected (i.e., closing the connection dispatches the deletion of the
> objects involved in the request handling)?
>
> I was told to look into keep-alive, but the specification doesn't seem to
> suggest that this would help me as it has more to do with re-using
> connections for subsequent requests rather than specifying that the server
> maintains a connection between the request and the objects involved in
> handling the request at the server.
>
> Any help would be greatly appreciated.
>
> Thanks
>
> [1] http://www.w3.org/TR/rdf-sparql-query/
>
>
> ===================================
>
> P Please consider the environment before printing this e-mail
>
> Cleveland Clinic is ranked one of the top hospitals
> in America by U.S. News & World Report (2008).
> Visit us online at http://www.clevelandclinic.org for
> a complete listing of our services, staff and
> locations.
>
>
> Confidentiality Note: ?This message is intended for use
> only by the individual or entity to which it is addressed
> and may contain information that is privileged,
> confidential, and exempt from disclosure under applicable
> law. ?If the reader of this message is not the intended
> recipient or the employee or agent responsible for
> delivering the message to the intended recipient, you are
> hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. ?If
> you have received this communication in error, ?please
> contact the sender immediately and destroy the material in
> its entirety, whether electronic or hard copy. ?Thank you.
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>

From manlio_perillo at libero.it  Mon Apr 13 23:58:48 2009
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Mon, 13 Apr 2009 23:58:48 +0200
Subject: [Web-SIG] Closing long-running WSGI requests (possible?)
In-Reply-To: <C608C71B.99F3%ogbujic@ccf.org>
References: <C608C71B.99F3%ogbujic@ccf.org>
Message-ID: <49E3B598.2000803@libero.it>

Chimezie Ogbuji ha scritto:
> Hello.  I have a problem with a WSGI-based SPARQL server that I have been
> unable to resolve for some time.  I was told this is the best place to ask
> :).  I'm building a SPARQL [1] server that is deployed as  WSGI/Paste
> server.  SPARQL queries are handled by the server and evaluated against a
> MySQL database using mysql-python/MySQLdb to manage the connection.
> 
> My goal is to be able to allow clients to close the connection in order to
> kill queries that have been dispatched (in order to 'abort' them).
> Unfortunately, when the client kills the connection, the application is not
> signaled in any way.  So, the result is that (for long-running queries), the
> MySQL query continues to run even after the connection is closed (by
> clicking cancel in the browser for instance).
> 
> [...]

What you want to do is not possible.

A more viable solution is to use JavaScript.
Add a custom "abort button" on the web page so that a function is
associate to the "click" event.

Also, you should associate a function to the "unload" event (where you
can check if there are active queries).

In the JavaScript function you can issue an XMLHTTPRequest, using an
unique identifier.

Note that if you use PostgreSQL, you can use:
http://www.postgresql.org/docs/8.3/interactive/protocol-flow.html#AEN73870

When you create a connection to PostgreSQL, the server will send you the
backend process id an unique key.

You can use this data to send a cancellation request.
All you need to do is to pass the process id and the unique key to the
client (with some encryption so that the client can use the data only once).

Unfortunately, libpq does not offer a flexible interface to this feature.
The PGCancel structure is opaque, so you need some hacking.


Manlio Perillo

From davidgshi at yahoo.co.uk  Tue Apr 14 12:29:37 2009
From: davidgshi at yahoo.co.uk (David Shi)
Date: Tue, 14 Apr 2009 10:29:37 +0000 (GMT)
Subject: [Web-SIG] RESTful Python email list?
Message-ID: <420162.76101.qm@web26306.mail.ukl.yahoo.com>

I am using Python and promoting the use of Python.? I am now interesting in finding good demos on generating tokens dynamically and using Javascript to call?RESTful services with token embedded.
?
Regards.
?
David

--- On Sat, 11/4/09, Jim Fulton <jim at zope.com> wrote:


From: Jim Fulton <jim at zope.com>
Subject: Re: [Web-SIG] RESTful Python email list?
To: "Pete" <pfein at pobox.com>
Cc: web-sig at python.org
Date: Saturday, 11 April, 2009, 3:01 PM


On Apr 10, 2009, at 5:12 PM, Pete wrote:

> This came up at the REST BoF at Pycon...
> 
> Any interest in a dedicated email list for REST + python, a la the restful-json group [0]?? The group would discuss strategies for REST architecture built with and within Python.? WSGI 1.0 vs. 2.0 vs. 2e6 is out of scope. ;-)


-1

I'd be happy to see the discussions here.

Jim

--
Jim Fulton
Zope Corporation


_______________________________________________
Web-SIG mailing list
Web-SIG at python.org
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/davidgshi%40yahoo.co.uk


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090414/d0dd079a/attachment.htm>

From davidgshi at yahoo.co.uk  Wed Apr 15 15:38:54 2009
From: davidgshi at yahoo.co.uk (David Shi)
Date: Wed, 15 Apr 2009 13:38:54 +0000 (GMT)
Subject: [Web-SIG] Python-generating tokens dynamically at runtime
Message-ID: <35827.24291.qm@web26301.mail.ukl.yahoo.com>

I am using Python and promoting the use of Python. 

I am now interesting in finding good demos on generating?tokens dynamically and using Javascript to call RESTful?services with token embedded.

Regards.

David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090415/7b11f764/attachment.htm>

From milesck at umich.edu  Wed Apr 15 23:16:08 2009
From: milesck at umich.edu (Miles Kaufmann)
Date: Wed, 15 Apr 2009 17:16:08 -0400
Subject: [Web-SIG] Python 3: Form data encoding issues in cgi and urllib
	modules
In-Reply-To: <5ec9495f0904111748p49ad255bib898d41e05e57d3d@mail.gmail.com>
References: <5ec9495f0904111748p49ad255bib898d41e05e57d3d@mail.gmail.com>
Message-ID: <5ec9495f0904151416l3f5705a9ufd695c02da3cfebf@mail.gmail.com>

On Sat, Apr 11, 2009 at 8:48 PM, Miles Kaufmann wrote:
> The first issue is that there doesn't seem to be a way to parse
> x-www-form-urlencoded query strings in a character set other than
> UTF-8, for example:
>
> 'premier=un&deuxi%E8me=deux' # latin-1
>
> The urllib.parse.unquote* functions take encoding and errors
> parameters, but none of the higher-level ones. ?The solution to me
> seems to be that functions that build on top of
> it--urllib.parse.parse*, cgi.parse*, and the cgi.FieldStorage
> constructor--should grow encoding and errors parameters that they pass
> through to the lower-level functions.
>
> The second issue is that the FieldStorage classes work with text input
> streams. ?However, with multipart/form-data posts, posted files aren't
> necessarily in the same encoding as form fields, or may be binary and
> not text at all. ?I would suggest that FieldStorage should be changed
> to take a binary input stream.
>
> [...]

I'm not quite sure how to interpret the lack of response I've gotten
on this topic.  Is it just that there's little interest in the cgi
module?  Should I raise this issue on the python-dev list, or just
open a bug report and start submitting patches?

There's been a lot of discussion recently about bytes vs. str in email
headers and WSGI environ variables, but I haven't been able to find a
substantive discussion on this specific topic.  Here are some of the
related quotes I've come across.

Martin v. L?wis wrote [1]:
> In a CGI application, you shouldn't be using sys.stdin or print().
> Instead, you should be using sys.stdin.buffer (or sys.stdin.buffer.raw),
> and sys.stdout.buffer.raw. A CGI script essentially does binary IO;
> if you use TextIO, there likely will be bugs (e.g. if you have
> attachments of type application/octet-stream).

bobince wrote [2]:
> Evan Fosmark wrote:
>> bobince wrote:
>>> So yeah, it's a bug in cgi.py, yet another victim of 2to3 conversion
>>> that hasn't been fixed properly for the new string model. It should
>>> be converting the incoming byte stream to characters before
>>> passing them to urllib.
>>>
>>> Did I mention Python 3.0's libraries (especially web-related
>>> ones) still being rather shonky? :-)
>>
>> Yeah. So far I've noticed huge problems with cgi, urllib, and
>> wsgiref. I hope they get fixed soon. :(
>
> Indeed. Momentum in WEB-SIG seems to have ground to a halt; no-one
> seems to want ownership of the issue. Very disappointing.

There's also this bug report[3], but it doesn't directly propose the
changes that I have.

So: does anyone agree, or disagree, that cgi.FieldStorage should be
changed to take byte streams, and many of the cgi and urllib.parse
functions should become encoding-aware, preferably in time for Python
3.1?  The byte-stream change will break compatibility with with Python
3.0, but I strongly feel that treating POST data as text is wrong and
should not continue to be supported.

-Miles Kaufmann

[1]: http://mail.python.org/pipermail/python-dev/2009-April/088727.html
[2]: http://stackoverflow.com/questions/540342/python-3-0-urllib
[3]: http://bugs.python.org/issue4953

From graham.dumpleton at gmail.com  Wed Apr 15 23:23:59 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 16 Apr 2009 07:23:59 +1000
Subject: [Web-SIG] Python 3: Form data encoding issues in cgi and urllib
	modules
In-Reply-To: <5ec9495f0904151416l3f5705a9ufd695c02da3cfebf@mail.gmail.com>
References: <5ec9495f0904111748p49ad255bib898d41e05e57d3d@mail.gmail.com>
	<5ec9495f0904151416l3f5705a9ufd695c02da3cfebf@mail.gmail.com>
Message-ID: <88e286470904151423h663102bdk49eaef5c258ae33f@mail.gmail.com>

2009/4/16 Miles Kaufmann <milesck at umich.edu>:
> On Sat, Apr 11, 2009 at 8:48 PM, Miles Kaufmann wrote:
>> The first issue is that there doesn't seem to be a way to parse
>> x-www-form-urlencoded query strings in a character set other than
>> UTF-8, for example:
>>
>> 'premier=un&deuxi%E8me=deux' # latin-1
>>
>> The urllib.parse.unquote* functions take encoding and errors
>> parameters, but none of the higher-level ones. ?The solution to me
>> seems to be that functions that build on top of
>> it--urllib.parse.parse*, cgi.parse*, and the cgi.FieldStorage
>> constructor--should grow encoding and errors parameters that they pass
>> through to the lower-level functions.
>>
>> The second issue is that the FieldStorage classes work with text input
>> streams. ?However, with multipart/form-data posts, posted files aren't
>> necessarily in the same encoding as form fields, or may be binary and
>> not text at all. ?I would suggest that FieldStorage should be changed
>> to take a binary input stream.
>>
>> [...]
>
> I'm not quite sure how to interpret the lack of response I've gotten
> on this topic. ?Is it just that there's little interest in the cgi
> module? ?Should I raise this issue on the python-dev list, or just
> open a bug report and start submitting patches?
>
> There's been a lot of discussion recently about bytes vs. str in email
> headers and WSGI environ variables, but I haven't been able to find a
> substantive discussion on this specific topic. ?Here are some of the
> related quotes I've come across.
>
> Martin v. L?wis wrote [1]:
>> In a CGI application, you shouldn't be using sys.stdin or print().
>> Instead, you should be using sys.stdin.buffer (or sys.stdin.buffer.raw),
>> and sys.stdout.buffer.raw. A CGI script essentially does binary IO;
>> if you use TextIO, there likely will be bugs (e.g. if you have
>> attachments of type application/octet-stream).
>
> bobince wrote [2]:
>> Evan Fosmark wrote:
>>> bobince wrote:
>>>> So yeah, it's a bug in cgi.py, yet another victim of 2to3 conversion
>>>> that hasn't been fixed properly for the new string model. It should
>>>> be converting the incoming byte stream to characters before
>>>> passing them to urllib.
>>>>
>>>> Did I mention Python 3.0's libraries (especially web-related
>>>> ones) still being rather shonky? :-)
>>>
>>> Yeah. So far I've noticed huge problems with cgi, urllib, and
>>> wsgiref. I hope they get fixed soon. :(
>>
>> Indeed. Momentum in WEB-SIG seems to have ground to a halt; no-one
>> seems to want ownership of the issue. Very disappointing.
>
> There's also this bug report[3], but it doesn't directly propose the
> changes that I have.
>
> So: does anyone agree, or disagree, that cgi.FieldStorage should be
> changed to take byte streams, and many of the cgi and urllib.parse
> functions should become encoding-aware, preferably in time for Python
> 3.1? ?The byte-stream change will break compatibility with with Python
> 3.0, but I strongly feel that treating POST data as text is wrong and
> should not continue to be supported.
>
> -Miles Kaufmann
>
> [1]: http://mail.python.org/pipermail/python-dev/2009-April/088727.html
> [2]: http://stackoverflow.com/questions/540342/python-3-0-urllib
> [3]: http://bugs.python.org/issue4953

Have you read:

  http://bugs.python.org/issue3300

This was referenced in a prior post here and is likely relevant. A lot
of the discussion for that was happening on developers list for Python
3.0.

Not sure why someone was taking issue with WEB-SIG list over cgi
FieldStorage issues as I don't recollect us having any substantive
discussion about it and any problems it has.

Graham

From milesck at umich.edu  Thu Apr 16 00:26:47 2009
From: milesck at umich.edu (Miles Kaufmann)
Date: Wed, 15 Apr 2009 18:26:47 -0400
Subject: [Web-SIG] Python 3: Form data encoding issues in cgi and urllib
	modules
In-Reply-To: <88e286470904151423h663102bdk49eaef5c258ae33f@mail.gmail.com>
References: <5ec9495f0904111748p49ad255bib898d41e05e57d3d@mail.gmail.com>
	<5ec9495f0904151416l3f5705a9ufd695c02da3cfebf@mail.gmail.com>
	<88e286470904151423h663102bdk49eaef5c258ae33f@mail.gmail.com>
Message-ID: <5ec9495f0904151526l35aaeb1cl57bddf4b82ccb4ff@mail.gmail.com>

On Wed, Apr 15, 2009 at 5:23 PM, Graham Dumpleton wrote:
> 2009/4/16 Miles Kaufmann <milesck at umich.edu>:
>> So: does anyone agree, or disagree, that cgi.FieldStorage should be
>> changed to take byte streams, and many of the cgi and urllib.parse
>> functions should become encoding-aware, preferably in time for Python
>> 3.1? ?The byte-stream change will break compatibility with with Python
>> 3.0, but I strongly feel that treating POST data as text is wrong and
>> should not continue to be supported.
>
> Have you read:
>
> ?http://bugs.python.org/issue3300
>
> This was referenced in a prior post here and is likely relevant. A lot
> of the discussion for that was happening on developers list for Python
> 3.0.

I hadn't. Thanks for the link! That was a long read, so apologies if I
missed anything, but that discussion seems to pertain almost entirely
to the urllib.parse.[un]quote* functions; there was only one point
where it was mentioned that there would be issues with non-UTF-8 data
for higher-level functions[1], and nothing followed from that.

I don't think it should be a controversial move to add encoding and
errors parameters to the following functions:

* urllib.parse.parse_qs
* urllib.parse.parse_qsl
* urllib.parse.urlencode

which, I feel, would be in line with the outcome of the discussion you
referenced, shouldn't break any existing code, and would make it
possible to parse the "quite prevalent"[2] instances of non-utf-8
query strings like the following:

'premier=un&deuxi%E8me=deux' # latin-1

The parameters would also need to be added to cgi.parse,
cgi.parse_multipart, and cgi.FieldStorage, if they were in fact
changed to expect a bytes file input, as I suggest.

> Not sure why someone was taking issue with WEB-SIG list over cgi
> FieldStorage issues as I don't recollect us having any substantive
> discussion about it and any problems it has.

Exactly; that person's issue was that there hasn't been substantive
discussion.  Which is what I'm trying to create now. :)

-Miles Kaufmann

[1]: http://bugs.python.org/msg70970
[2]: http://lists.w3.org/Archives/Public/www-international/2008JulSep/0042.html

From graham.dumpleton at gmail.com  Thu Apr 16 09:12:11 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 16 Apr 2009 17:12:11 +1000
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>
Message-ID: <88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>

2009/4/4 Robert Brewer <fumanchu at aminus.org>:
> Alan Kennedy wrote:
>> [Bill]
>> > I think the controlling reference here is RFC 3875.
>>
>> I think the controlling references are RFC 2616, RFC 2396 and RFC
> 3987.
>>
>> RFC 2616, the HTTP 1.1 spec, punts on the question of character
>> encoding for the request URI.
>>
>> RFC 2396, the URI spec, says
>>
>> """
>> ? ?It is expected that a systematic treatment of character encoding
>> ? ?within URI will be developed as a future modification of this
>> ? ?specification.
>> """
>>
>> RFC 3987 is that spec, for Internationalized Resource Identifiers. It
>> says
>>
>> """
>> An IRI is a sequence of characters from the Universal Character Set
>> (Unicode/ISO 10646).
>> """
>>
>> and
>>
>> """
>> 1.2. ?Applicability
>>
>> ? ?IRIs are designed to be compatible with recommendations for new URI
>> ? ?schemes [RFC2718]. ?The compatibility is provided by specifying a
>> ? ?well-defined and deterministic mapping from the IRI character
>> ? ?sequence to the functionally equivalent URI character sequence.
>> ? ?Practical use of IRIs (or IRI references) in place of URIs (or URI
>> ? ?references) depends on the following conditions being met:
>> """
>>
>> followed by
>>
>> """
>> ? ?c. ?The URI corresponding to the IRI in question has to encode
>> ? ? ? ?original characters into octets using UTF-8. ?For new URI
>> ? ? ? ?schemes, this is recommended in [RFC2718]. ?It can apply to a
>> ? ? ? ?whole scheme (e.g., IMAP URLs [RFC2192] and POP URLs [RFC2384],
>> ? ? ? ?or the URN syntax [RFC2141]). ?It can apply to a specific part
>> of
>> ? ? ? ?a URI, such as the fragment identifier (e.g., [XPointer]). ?It
>> ? ? ? ?can apply to a specific URI or part(s) thereof. ?For details,
>> ? ? ? ?please see section 6.4.
>> """
>>
>> I think the question is "are people using IRIs in the wild"? If so,
>> then we must decide how do we best deal with the problems of
>> recognising iso-8859-1+rfc2037 versus utf-8, or whatever
>> server-configured encoding the user has chosen.
>
> Agreed. The Request-URI needs to handle IRI's. The headers mostly
> don't--almost all headers are of mostly type "token", which is US-ASCII.
> A few are of type "TEXT", which is ISO-8859-1/RFC 2047. The remaining
> (sub)values are mostly custom byte sequences:
>
> field-name ? ? ? ? ? field-value
> ---------- ? ? ? ? ? -----------
> Accept ? ? ? ? ? ? ? token
> Accept-Charset ? ? ? token
> Accept-Encoding ? ? ?token
> Accept-Language ? ? ?ALPHA, plus ":", "=", "q" etc
> Accept-Ranges ? ? ? ?token
> Age ? ? ? ? ? ? ? ? ?DIGIT
> Allow ? ? ? ? ? ? ? ?token
> Authorization ? ? ? ?token
> Cache-Control ? ? ? ?token
> Connection ? ? ? ? ? token
> Content-Encoding ? ? token
> Content-Language ? ? ALPHA
> Content-Length ? ? ? DIGIT
> Content-Location ? ? absoluteURI | relativeURI
> Content-MD5 ? ? ? ? ?base64 of 128 bit md5 digest
> Content-Range ? ? ? ?DIGIT, plus "/" etc
> Content-Type ? ? ? ? token
> Date ? ? ? ? ? ? ? ? HTTP-date
> ETag ? ? ? ? ? ? ? ? TEXT and CHAR
> Expect ? ? ? ? ? ? ? token, quoted-string
> Expires ? ? ? ? ? ? ?HTTP-date
> >From ? ? ? ? ? ? ? ? ASCII (see RFC 822)
> Host ? ? ? ? ? ? ? ? host ":" port
> If-Match ? ? ? ? ? ? TEXT and CHAR
> If-Modified-Since ? ?HTTP-date
> If-None-Match ? ? ? ?TEXT and CHAR
> If-Range ? ? ? ? ? ? TEXT and CHAR | HTTP-date
> If-Unmodified-Since ?HTTP-date
> Last-Modified ? ? ? ?HTTP-date
> Location ? ? ? ? ? ? absoluteURI
> Max-Forwards ? ? ? ? DIGIT
> Pragma ? ? ? ? ? ? ? token, quoted-string
> Proxy-Authenticate ? token
> Proxy-Authorization ?token
> Range ? ? ? ? ? ? ? ?token
> Referer ? ? ? ? ? ? ?absoluteURI | relativeURI
> Retry-After ? ? ? ? ?HTTP-date | DIGIT
> Server ? ? ? ? ? ? ? token, TEXT
> TE ? ? ? ? ? ? ? ? ? token
> Trailer ? ? ? ? ? ? ?token
> Transfer-Encoding ? ?token
> Upgrade ? ? ? ? ? ? ?token
> User-Agent ? ? ? ? ? token, TEXT
> Vary ? ? ? ? ? ? ? ? token
> Via ? ? ? ? ? ? ? ? ?token, host, port
> Warning ? ? ? ? ? ? ?quoted-string, HTTP-date, host, port
> WWW-Authenticate ? ? token
>
>
> The Content-Location, Location, and Referer headers are problematic
> since HTTP borrows those from the URI spec, which deals in characters
> and not bytes, as you mentioned. Host, and maybe Via, are also special
> due to possible IDNA-encoding.
>
> Regarding extension headers, I think we should assume that the HTTP/1.1
> spec implies all headers should be token (ASCII) or TEXT (ISO-8859-1).
> >From section 4.2:
>
> ? ?field-content ?= <the OCTETs making up the field-value
> ? ? ? ? ? ? ? ? ? ? and consisting of either *TEXT or combinations
> ? ? ? ? ? ? ? ? ? ? of token, separators, and quoted-string>
>
> In addition, the httpbis effort seems to be enforcing this even more
> strongly [1]:
>
> ? ? message-header = field-name ":" OWS [ field-value ] OWS
> ? ? field-name ? ? = token
> ? ? field-value ? ?= *( field-content / OWS )
> ? ? field-content ?= *( WSP / VCHAR / obs-text )
>
> ? Historically, HTTP has allowed field-content with text in the ISO-
> ? 8859-1 [ISO-8859-1] character encoding (allowing other character sets
> ? through use of [RFC2047] encoding). ?In practice, most HTTP header
> ? field-values use only a subset of the US-ASCII charset [USASCII].
> ? Newly defined header fields SHOULD constrain their field-values to
> ? US-ASCII characters. ?Recipients SHOULD treat other (obs-text) octets
> ? in field-content as opaque data.
>
> So, from where I sit, we have:
>
> ?1. Many header values which are ASCII.
> ?2. A few header values which are ISO-8859-1 plus RFC 2047.
> ?3. A few header values which are URI's (no specified encoding) or IRI's
> (UTF-8).
>
> I understand the desire to decode ASAP, and I agree with Guido that we
> should use a default encoding which the app can override. Looking at the
> above, ISO-8859-1 is the best encoding I know of for all three header
> cases. ASCII can be used as a valid subset without transcoding; headers
> which are ISO-8859-1 are decoded perfectly; URI/IRI headers can be
> transcoded by the app if needed, but mangled opaquely by middleware.
>
> If we make *that* call, then IMO there's no reason not to do the same to
> SCRIPT_NAME, PATH_INFO, and QUERY_STRING.

I am not sure we ended up with a final answer on all of this, but I
don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
any longer. As such, am implementing things as per:

  http://www.wsgi.org/wsgi/Amendments_1.0

with exception that will not be attempting to do decoding per RFC
2047. Any CGI variables not related to HTTP headers will also be
handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
This should be equivalent with what wsgiref does in Python 3.X and
basically keeps the status quo.

If anyone has any last things to say on all of this, please speak up now.

Graham

From fumanchu at aminus.org  Thu Apr 16 18:33:58 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Thu, 16 Apr 2009 09:33:58 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
References: <88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
Message-ID: <1239899638.19337.5.camel@haku>

On Thu, 2009-04-16 at 00:12 -0700, Graham Dumpleton wrote:
> > So, from where I sit, we have:
> >
> >  1. Many header values which are ASCII.
> >  2. A few header values which are ISO-8859-1 plus RFC 2047.
> >  3. A few header values which are URI's (no specified encoding) or
> IRI's
> > (UTF-8).
> >
> > I understand the desire to decode ASAP, and I agree with Guido that
> we
> > should use a default encoding which the app can override. Looking at
> the
> > above, ISO-8859-1 is the best encoding I know of for all three
> header
> > cases. ASCII can be used as a valid subset without transcoding;
> headers
> > which are ISO-8859-1 are decoded perfectly; URI/IRI headers can be
> > transcoded by the app if needed, but mangled opaquely by middleware.
> >
> > If we make *that* call, then IMO there's no reason not to do the
> > same to SCRIPT_NAME, PATH_INFO, and QUERY_STRING.
> 
> I am not sure we ended up with a final answer on all of this, but I
> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
> any longer. As such, am implementing things as per:
> 
>   http://www.wsgi.org/wsgi/Amendments_1.0
> 
> with exception that will not be attempting to do decoding per RFC
> 2047. Any CGI variables not related to HTTP headers will also be
> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
> This should be equivalent with what wsgiref does in Python 3.X and
> basically keeps the status quo.
> 
> If anyone has any last things to say on all of this, please speak up
> now.
> 
That sounds fine to me, Graham, and is what I'll be implementing in my
python3 branch for CherryPy barring any unforeseen impediments.


Robert Brewer
fumanchu at aminus.org


From foom at fuhm.net  Thu Apr 16 21:31:18 2009
From: foom at fuhm.net (James Y Knight)
Date: Thu, 16 Apr 2009 15:31:18 -0400
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>
	<88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
Message-ID: <052B16D8-D0F6-4D85-9A39-9BA9F2F544EA@fuhm.net>

On Apr 16, 2009, at 3:12 AM, Graham Dumpleton wrote:
> I am not sure we ended up with a final answer on all of this, but I
> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
> any longer. As such, am implementing things as per:
>
>  http://www.wsgi.org/wsgi/Amendments_1.0
>
> with exception that will not be attempting to do decoding per RFC
> 2047. Any CGI variables not related to HTTP headers will also be
> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
> This should be equivalent with what wsgiref does in Python 3.X and
> basically keeps the status quo.
>
> If anyone has any last things to say on all of this, please speak up  
> now.


IMO it would make more sense to have the headers be bytes instead of  
strings decoded/encoded with latin-1, but it's not a huge deal...

James

From graham.dumpleton at gmail.com  Fri Apr 17 01:03:27 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 17 Apr 2009 09:03:27 +1000
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <052B16D8-D0F6-4D85-9A39-9BA9F2F544EA@fuhm.net>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>
	<88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
	<052B16D8-D0F6-4D85-9A39-9BA9F2F544EA@fuhm.net>
Message-ID: <88e286470904161603w459683f0n310334e7a101ff6b@mail.gmail.com>

2009/4/17 James Y Knight <foom at fuhm.net>:
> On Apr 16, 2009, at 3:12 AM, Graham Dumpleton wrote:
>>
>> I am not sure we ended up with a final answer on all of this, but I
>> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
>> any longer. As such, am implementing things as per:
>>
>> ?http://www.wsgi.org/wsgi/Amendments_1.0
>>
>> with exception that will not be attempting to do decoding per RFC
>> 2047. Any CGI variables not related to HTTP headers will also be
>> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
>> This should be equivalent with what wsgiref does in Python 3.X and
>> basically keeps the status quo.
>>
>> If anyone has any last things to say on all of this, please speak up now.
>
>
> IMO it would make more sense to have the headers be bytes instead of strings
> decoded/encoded with latin-1, but it's not a huge deal...

It is a huge deal in as much as we don't use any sort of formal voting
process here and for better or worse, rely on consensus. If there is
anyone who has countering views and we don't as a group come up with
some formal statement about how things should be done, then it makes
it very hard for the likes of Robert and myself who need to implement
the thing. So, we need to deal with the different views people have
and balance them up and make a decision. Until I feel there is some
sort of official decision one way or another, I can't release any
code.

Graham

From maluke at gmail.com  Fri Apr 17 01:28:23 2009
From: maluke at gmail.com (Sergey Schetinin)
Date: Fri, 17 Apr 2009 02:28:23 +0300
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904161603w459683f0n310334e7a101ff6b@mail.gmail.com>
References: <88e286470904010329r5222c37bl73ab5dd234ac29de@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407C57FFE@ex10.hostedexchange.local>
	<ca471dc20904010934k74c4f6d8kda52bef12d03461b@mail.gmail.com>
	<86217.1238608796@parc.com>
	<4a951aa00904011615w58651c62ucadd5da07f4a6005@mail.gmail.com>
	<F1962646D3B64642B7C9A06068EE1E6407CC648F@ex10.hostedexchange.local>
	<88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
	<052B16D8-D0F6-4D85-9A39-9BA9F2F544EA@fuhm.net>
	<88e286470904161603w459683f0n310334e7a101ff6b@mail.gmail.com>
Message-ID: <116315680904161628x73535c8dxde951fc33225f3ea@mail.gmail.com>

On Fri, Apr 17, 2009 at 02:03, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
> 2009/4/17 James Y Knight <foom at fuhm.net>:
>> On Apr 16, 2009, at 3:12 AM, Graham Dumpleton wrote:
>>>
>>> I am not sure we ended up with a final answer on all of this, but I
>>> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
>>> any longer. As such, am implementing things as per:
>>>
>>> ?http://www.wsgi.org/wsgi/Amendments_1.0
>>>
>>> with exception that will not be attempting to do decoding per RFC
>>> 2047. Any CGI variables not related to HTTP headers will also be
>>> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
>>> This should be equivalent with what wsgiref does in Python 3.X and
>>> basically keeps the status quo.
>>>
>>> If anyone has any last things to say on all of this, please speak up now.
>>
>>
>> IMO it would make more sense to have the headers be bytes instead of strings
>> decoded/encoded with latin-1, but it's not a huge deal...
>
> It is a huge deal in as much as we don't use any sort of formal voting
> process here and for better or worse, rely on consensus. If there is
> anyone who has countering views and we don't as a group come up with
> some formal statement about how things should be done, then it makes
> it very hard for the likes of Robert and myself who need to implement
> the thing. So, we need to deal with the different views people have
> and balance them up and make a decision. Until I feel there is some
> sort of official decision one way or another, I can't release any
> code.

+1 to Amendments.
I work with WSGI quite a lot and have a server implementation as well
(experimental trellis-based server w/ async app extensions), while I
don't plan to use 3.x branch anytime soon, all the amendments make
perfect sense to me. I did encounter user-agents that send HTTP path
encoded in cp1251 for example, but I don't think that it's a good idea
to keep environ values as bytes and expect WSGI apps to sort out the
mess. The U-A that sent the broken path seemed to be some sort of
spider, so it's not like one would be losing visitors due to this.

From graham.dumpleton at gmail.com  Fri Apr 17 01:37:57 2009
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 17 Apr 2009 09:37:57 +1000
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <1239899638.19337.5.camel@haku>
References: <88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
	<1239899638.19337.5.camel@haku>
Message-ID: <88e286470904161637p4e7ba4aj6228117d29439ac5@mail.gmail.com>

2009/4/17 Robert Brewer <fumanchu at aminus.org>:
> On Thu, 2009-04-16 at 00:12 -0700, Graham Dumpleton wrote:
>> > So, from where I sit, we have:
>> >
>> > ?1. Many header values which are ASCII.
>> > ?2. A few header values which are ISO-8859-1 plus RFC 2047.
>> > ?3. A few header values which are URI's (no specified encoding) or
>> IRI's
>> > (UTF-8).
>> >
>> > I understand the desire to decode ASAP, and I agree with Guido that
>> we
>> > should use a default encoding which the app can override. Looking at
>> the
>> > above, ISO-8859-1 is the best encoding I know of for all three
>> header
>> > cases. ASCII can be used as a valid subset without transcoding;
>> headers
>> > which are ISO-8859-1 are decoded perfectly; URI/IRI headers can be
>> > transcoded by the app if needed, but mangled opaquely by middleware.
>> >
>> > If we make *that* call, then IMO there's no reason not to do the
>> > same to SCRIPT_NAME, PATH_INFO, and QUERY_STRING.
>>
>> I am not sure we ended up with a final answer on all of this, but I
>> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
>> any longer. As such, am implementing things as per:
>>
>> ? http://www.wsgi.org/wsgi/Amendments_1.0
>>
>> with exception that will not be attempting to do decoding per RFC
>> 2047. Any CGI variables not related to HTTP headers will also be
>> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
>> This should be equivalent with what wsgiref does in Python 3.X and
>> basically keeps the status quo.
>>
>> If anyone has any last things to say on all of this, please speak up
>> now.
>>
> That sounds fine to me, Graham, and is what I'll be implementing in my
> python3 branch for CherryPy barring any unforeseen impediments.

Are you moving to use of empty string as end of input sentinel for
wsgi.input for case where code does actually read more than
CONTENT_LENGTH?

Graham

From fumanchu at aminus.org  Fri Apr 17 02:06:55 2009
From: fumanchu at aminus.org (Robert Brewer)
Date: Thu, 16 Apr 2009 17:06:55 -0700
Subject: [Web-SIG] Python 3.0 and WSGI 1.0.
In-Reply-To: <88e286470904161637p4e7ba4aj6228117d29439ac5@mail.gmail.com>
References: <88e286470904160012g7c748d8bke584a5325fbdc03@mail.gmail.com>
	<1239899638.19337.5.camel@haku>
	<88e286470904161637p4e7ba4aj6228117d29439ac5@mail.gmail.com>
Message-ID: <1239926815.19337.13.camel@haku>

On Fri, 2009-04-17 at 09:37 +1000, Graham Dumpleton wrote:
> >> I am not sure we ended up with a final answer on all of this, but I
> >> don't want to hold up mod_wsgi 3.0, which includes Python 3.0 support,
> >> any longer. As such, am implementing things as per:
> >>
> >>   http://www.wsgi.org/wsgi/Amendments_1.0
> >>
> >> with exception that will not be attempting to do decoding per RFC
> >> 2047. Any CGI variables not related to HTTP headers will also be
> >> handled as latin-1, including SCRIPT_NAME, PATH_INFO and QUERY_STRING.
> >> This should be equivalent with what wsgiref does in Python 3.X and
> >> basically keeps the status quo.
> >>
> > That sounds fine to me, Graham, and is what I'll be implementing in my
> > python3 branch for CherryPy barring any unforeseen impediments.
> 
> Are you moving to use of empty string as end of input sentinel for
> wsgi.input for case where code does actually read more than
> CONTENT_LENGTH?

Sure; I think that's reasonable. It's supposed to be 'file-like'.


Robert Brewer
fumanchu at aminus.org


From randy at rcs-comp.com  Mon Apr 27 04:32:20 2009
From: randy at rcs-comp.com (Randy Syring)
Date: Sun, 26 Apr 2009 22:32:20 -0400
Subject: [Web-SIG] Use 200 or 400 Status Code When...
Message-ID: <49F51934.90903@rcs-comp.com>

I have a page that accepts URL arguments like:

/student/<id>

The id must be an integer or the URL doesn't match and the user is given 
a 404.  But what should I do if the id is given, is an integer, but a 
student with that id does not exist?  I already output a message telling 
the user that they requested an invalid student.  However, should that 
document have a 200 or 400 (or some other) status code?

Thanks.

-- 
--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


From t.broyer at gmail.com  Mon Apr 27 10:38:15 2009
From: t.broyer at gmail.com (Thomas Broyer)
Date: Mon, 27 Apr 2009 10:38:15 +0200
Subject: [Web-SIG] Use 200 or 400 Status Code When...
In-Reply-To: <49F51934.90903@rcs-comp.com>
References: <49F51934.90903@rcs-comp.com>
Message-ID: <a9699fd20904270138k7a27b979k97d769eb6dee8c38@mail.gmail.com>

On Mon, Apr 27, 2009 at 4:32 AM, Randy Syring <randy at rcs-comp.com> wrote:
> I have a page that accepts URL arguments like:
>
> /student/<id>
>
> The id must be an integer or the URL doesn't match and the user is given a
> 404. ?But what should I do if the id is given, is an integer, but a student
> with that id does not exist??I already output a message telling the user
> that they requested an invalid student. ?However, should that document have
> a 200 or 400 (or some other) status code?

Obviously a 404 too, as the URL identifies something that doesn't exist.

(in the case of an invalid id, i.e. not a number, you could use 410
status code too)

-- 
Thomas Broyer

From randy at rcs-comp.com  Mon Apr 27 23:10:34 2009
From: randy at rcs-comp.com (Randy Syring)
Date: Mon, 27 Apr 2009 17:10:34 -0400
Subject: [Web-SIG] Use 200 or 400 Status Code When...
In-Reply-To: <a9699fd20904270138k7a27b979k97d769eb6dee8c38@mail.gmail.com>
References: <49F51934.90903@rcs-comp.com>
	<a9699fd20904270138k7a27b979k97d769eb6dee8c38@mail.gmail.com>
Message-ID: <49F61F4A.8000602@rcs-comp.com>

Thomas,

Unfortunately, it wasn't obvious to me that a 404 was appropriate in 
this situation.  But, now that you mention it, I think you are right.  
Thank you for your input.

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


Thomas Broyer wrote:
> On Mon, Apr 27, 2009 at 4:32 AM, Randy Syring <randy at rcs-comp.com> wrote:
>   
>> I have a page that accepts URL arguments like:
>>
>> /student/<id>
>>
>> The id must be an integer or the URL doesn't match and the user is given a
>> 404.  But what should I do if the id is given, is an integer, but a student
>> with that id does not exist? I already output a message telling the user
>> that they requested an invalid student.  However, should that document have
>> a 200 or 400 (or some other) status code?
>>     
>
> Obviously a 404 too, as the URL identifies something that doesn't exist.
>
> (in the case of an invalid id, i.e. not a number, you could use 410
> status code too)
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090427/8e403717/attachment.htm>

From randy at rcs-comp.com  Mon Apr 27 23:19:18 2009
From: randy at rcs-comp.com (Randy Syring)
Date: Mon, 27 Apr 2009 17:19:18 -0400
Subject: [Web-SIG] empty action attribute with forms in Google Chrome
Message-ID: <49F62156.5080006@rcs-comp.com>

For the last four years, I have always used an empty action attribute on 
my form to make it post back to the current URL.  I almost always 
validate my HTML and this has never come up as a violation.  
Furthermore, I have read various people on the web advocating this practice.

Recently, however, I went to use Google Chrome to look at some of my web 
apps and I noticed that none of my forms work.  In use a <base> tag and 
empty form attributes.  Whenever I submit a form in Chrome, it gets 
posted to the root URL (i.e. what I have in my <base> tag).  Am I 
violating the spec or is this something Google Chrome got wrong?  What I 
have works in IE, FF, and Opera.

Thanks.

-- 
--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


From t.broyer at gmail.com  Tue Apr 28 10:01:04 2009
From: t.broyer at gmail.com (Thomas Broyer)
Date: Tue, 28 Apr 2009 10:01:04 +0200
Subject: [Web-SIG] empty action attribute with forms in Google Chrome
In-Reply-To: <49F62156.5080006@rcs-comp.com>
References: <49F62156.5080006@rcs-comp.com>
Message-ID: <a9699fd20904280101p4862e9d3jfa792ccb2f85862a@mail.gmail.com>

On Mon, Apr 27, 2009 at 11:19 PM, Randy Syring <randy at rcs-comp.com> wrote:
> For the last four years, I have always used an empty action attribute on my
> form to make it post back to the current URL. ?I almost always validate my
> HTML and this has never come up as a violation. ?Furthermore, I have read
> various people on the web advocating this practice.
>
> Recently, however, I went to use Google Chrome to look at some of my web
> apps and I noticed that none of my forms work. ?In use a <base> tag and
> empty form attributes. ?Whenever I submit a form in Chrome, it gets posted
> to the root URL (i.e. what I have in my <base> tag). ?Am I violating the
> spec or is this something Google Chrome got wrong?

You are violating the spec (or, actually, this a bit of a blurry thing
in the spec re. a "same document reference").

>?What I have works in IE, FF, and Opera.

Yes, because they're violating the spec too. HTML5 defines the form
submission to violate the RFC 3986 to make it work like IE, FF and
Opera:
http://www.w3.org/TR/html5/forms.html#form-submission-algorithm (step 9)
The comments there (an HTML comment, look at the source of the page) says:
    <!-- Don't ask me why. But that's what IE does. It even treats
    action="" differently from action=" " or action="#" (the latter
    two resolve to the base URL, the first one resolves to the doc
    URL). And other browsers concur. It is even required, see e.g.
      http://bugs.webkit.org/show_bug.cgi?id=7763
      https://bugzilla.mozilla.org/show_bug.cgi?id=297761
    -->

(I'm not sure web-sig is the appropriate list for these questions, as
they're unrelated to Python; maybe http://www.whatwg.org/mailing-list
or http://forums.whatwg.org/ )

-- 
Thomas Broyer

From randy at rcs-comp.com  Tue Apr 28 16:38:54 2009
From: randy at rcs-comp.com (Randy Syring)
Date: Tue, 28 Apr 2009 10:38:54 -0400
Subject: [Web-SIG] empty action attribute with forms in Google Chrome
In-Reply-To: <a9699fd20904280101p4862e9d3jfa792ccb2f85862a@mail.gmail.com>
References: <49F62156.5080006@rcs-comp.com>
	<a9699fd20904280101p4862e9d3jfa792ccb2f85862a@mail.gmail.com>
Message-ID: <49F714FE.4050904@rcs-comp.com>

Thomas,

Thanks for your info.  Looks like I need to change my SOP.

And you are right, I should find a different list for these questions.  
I am using a python web app, but these questions are generic enough to 
go somewhere else.  Thanks for the kind word and your advice.

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


Thomas Broyer wrote:
> On Mon, Apr 27, 2009 at 11:19 PM, Randy Syring <randy at rcs-comp.com> wrote:
>   
>> For the last four years, I have always used an empty action attribute on my
>> form to make it post back to the current URL.  I almost always validate my
>> HTML and this has never come up as a violation.  Furthermore, I have read
>> various people on the web advocating this practice.
>>
>> Recently, however, I went to use Google Chrome to look at some of my web
>> apps and I noticed that none of my forms work.  In use a <base> tag and
>> empty form attributes.  Whenever I submit a form in Chrome, it gets posted
>> to the root URL (i.e. what I have in my <base> tag).  Am I violating the
>> spec or is this something Google Chrome got wrong?
>>     
>
> You are violating the spec (or, actually, this a bit of a blurry thing
> in the spec re. a "same document reference").
>
>   
>>  What I have works in IE, FF, and Opera.
>>     
>
> Yes, because they're violating the spec too. HTML5 defines the form
> submission to violate the RFC 3986 to make it work like IE, FF and
> Opera:
> http://www.w3.org/TR/html5/forms.html#form-submission-algorithm (step 9)
> The comments there (an HTML comment, look at the source of the page) says:
>     <!-- Don't ask me why. But that's what IE does. It even treats
>     action="" differently from action=" " or action="#" (the latter
>     two resolve to the base URL, the first one resolves to the doc
>     URL). And other browsers concur. It is even required, see e.g.
>       http://bugs.webkit.org/show_bug.cgi?id=7763
>       https://bugzilla.mozilla.org/show_bug.cgi?id=297761
>     -->
>
> (I'm not sure web-sig is the appropriate list for these questions, as
> they're unrelated to Python; maybe http://www.whatwg.org/mailing-list
> or http://forums.whatwg.org/ )
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090428/c50dd7bf/attachment.htm>