[Web-SIG] WSGI for Python 3

Graham Dumpleton graham.dumpleton at gmail.com
Wed Jul 14 07:04:48 CEST 2010


On 14 July 2010 14:43, Ian Bicking <ianb at colorstudy.com> wrote:
> So... there's been some discussion of WSGI on Python 3 lately.  I'm not
> feeling as pessimistic as some people, I feel like we were close but just
> didn't *quite* get there.

What I took from the discussion wasn't that one couldn't specify a
WSGI interface, and as you say we more or less have one now, the issue
is more about how practical that is from a usability perspective for
those who have to code stuff on top.

The concern seems to be that although it may be easy to work with the
specification for those who at the lowest layer immediately wrap it in
a higher level abstraction that normalises stuff into something that
is then used consistently in that way, for those who use lower level
raw WSGI right through the stack, especially in the context of
stackable WSGI middleware, that repetitive task of having to deal with
the byte/unicode issues at every point it just a big PITA.

That said, my job in writing the WSGI adapter is really easy as I
don't have to worry about these issues. This is why I don't seem to
really appreciate the concerns people are expressing. The above is how
I read things though.

> Here's my thoughts:
>
> * Everyone agrees keys in the environ should be native strings
> * Bodies should stay bytes
> * Can we make all "standard" values that are str on Python 2, str on Python
> 3 with a Latin1 encoding?  This is basically what wsgiref did.  This means
> HTTP_*, SERVER_NAME, etc.  Everything CGIish, and everything with an
> all-caps key.  There's only a couple tricky keys: SCRIPT_NAME, PATH_INFO,
> and HTTP_COOKIE.
> * I propose we let libraries handle HTTP_COOKIE however they want; don't
> bother transcoding *into* the environ, just do so when you parse the cookie
> (if you so choose).  Happy developers will just urlencode all their cookie
> values to keep their cookies ASCII-clean.  Unhappy developers who have to
> handle legacy cookies will just run environ['HTTP_COOKIE'].decode('latin1')
> and then do whatever sad magic they are forced to do.
> * I (re)propose we eliminate SCRIPT_NAME and PATH_INFO and replace them
> exclusively with encoded versions (that represent the original request
> URI).  We use Latin1 encoding, but it should be ASCII anyway, like most of
> the headers.
> * I'm terrible at naming, but let's say these new values are RAW_SCRIPT_NAME
> and RAW_PATH_INFO.

My prior suggestion on that since upper case keys for now effectively
derive from CGI, was to make them wsgi.script_name and wsgi.path_info.
Ie., push them into the wsgi namespace.

> Does this solve everything?  There's broken stuff in the stdlib, but we
> shouldn't bother ourselves with that -- if we need working code we should
> just write it and ignore the stdlib or submit our stuff as patches to the
> stdlib.

The quick summary of what I suggest before is at:

  http://code.google.com/p/modwsgi/wiki/SupportForPython3X

I believe the only difference I see is the raw SCRIPT_NAME and
PATH_INFO, which got discussed to death previously with no consensus.

> Some environments will have a hard time constructing RAW_SCRIPT_NAME and
> RAW_PATH_INFO, but in my opinion they can just encode SCRIPT_NAME and
> PATH_INFO and be done with it; it's not as accurate, but it's no less
> accurate than what we have now.
>
> Actual transcoding in the environ is not supported or encouraged in this
> scheme.  If you want to adjust an encoding you should do it in your
> application/library code.
>
> There's some other topics, like chunked responses, unknown request body
> lengths, start_response, and maybe some other things, but these aren't
> Python 3 issues, they are just... generic issues.  app_iter.close() might be
> worth thinking about given new iterator semantics introduced since WSGI was
> written.

Graham


More information about the Web-SIG mailing list