[Web-SIG] Python 3.0 and WSGI 1.0.

Graham Dumpleton graham.dumpleton at gmail.com
Fri Apr 3 00:27:21 CEST 2009


2009/4/3 James Y Knight <foom at fuhm.net>:
>
> On Apr 2, 2009, at 1:40 PM, Tres Seaver wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> James Y Knight wrote:
>>>
>>> On Apr 2, 2009, at 7:33 AM, Graham Dumpleton wrote:
>>>
>>>> """When running under Python 3, servers MUST provide CGI HTTP
>>>> variables as strings, decoded from the headers using HTTP standard
>>>> encodings (i.e. latin-1 + RFC 2047)"""
>>>>
>>>> Which is fair enough and basically what the RFCs say. At the moment I
>>>> don't apply RFC 2047 rules in Python 3.0 support in mod_wsgi, so just
>>>> need to do that.
>>>
>>> I'd really *really* like to recommend that any mention of RFC 2047 is
>>> stricken from the WSGI server requirements. I cannot imagine that
>>> decoding actually accomplishing anything other than opening security
>>> holes (think a filter in an upstream proxy that doesn't know how to do
>>> 2047-decoding passing something through that you now decode.)
>>>
>>> Also, you have to only do the decoding on TEXT words according to the
>>> spec, so the WSGI container now needs an HTTP header parser just in
>>> order to determine where it should decode RFC2047 words and where not
>>> to? I don't think so...
>>
>> Couldn't the spec mandate that decoding RFC 2047 headers is the
>> responsibility of the non-middleware WSGI server?  I agree that
>> middleware and applications shouldn't know ore care about that problem.
>> Under Python 2.x, the server would transcode those values to the
>> "common" encoding used for all values in the WSGI environment;  under
>> Python 3.x, it would just decode them to unicode.
>>
>
> I think you're saying you agree with exactly the opposite of what I meant.
> The server/gateway (aka apache mod_wsgi) *must not* be required to handle
> RFC2047 decoding. Only the application (or a header parsing library that the
> application uses) can possibly handle this properly.
>
> That's why I think it should not be mentioned at all in the WSGI
> requirements for the server.
>
> Furthermore, although they certainly can if they want, I'd recommend that no
> applications actually bother with doing such decoding, since RFC2047 words
> in http headers are essentially never used.

Having the WSGI adapter ignore it would be fine by me, as it then
effectively mirrors the current behaviour of Python 2.X. That is, in
Python 2.X the WSGI application would have to deal with them anyway.

If RFC2047 comes into play in response headers as well, then also the
WSGI application's responsibility there given that it should be
returning bytes for response headers and so would therefore have had
to apply such an encoding if necessary anyway.

For WSGI 1.0 and Python 3.0 can therefore possibly maintain the status
quo, or as close as possible, with Python 2.X behaviour. If we want to
think about changing it, then address it in WSGI 2.0 where more
significant changes being made anyway. Better that than for WSGI 1.0
and Python 2.X and Python 3.0 having different requirements.

Graham


More information about the Web-SIG mailing list