[Web-SIG] Request for Comments on upcoming WSGI Changes

P.J. Eby pje at telecommunity.com
Mon Sep 21 21:31:28 CEST 2009


At 11:23 AM 9/21/2009 -0700, Robert Brewer wrote:
>I still don't see why the environ should have multiple versions of 
>anything. It's not as if the HTTP request gives us multiple 
>Request-URI's. There's a single processing step that has to happen 
>somewhere: decoding the bytes of the Request-URI to unicode. For the 
>vast majority of apps, it should only happen once. Twice is 
>acceptable to me for some apps. As I pointed out in the linked 
>email, doing that as soon as possible (i.e. in the WSGI origin 
>server) allows URI's to be compared as character strings more 
>easily. If you deploy a piece of middleware that transcodes (based 
>on more information than servers want to deal with), it had better 
>be nearly first in the stack so routing works reliably.

The problem with this whole approach is that it's not 
composable.  You can't stick in an application under a router that 
uses a different method for grokking its subtree of the URI space, 
unless it knows what's been done to the URI and can un-do it.

Maybe I'm missing something here, but the only way I see to preserve 
composability here is to use latin-1 or bytes.

The fundamental problem is that, like it or not, HTTP headers are 
actually byte strings.  The *only* reason we ever supported unicode 
in WSGI was to handle platforms where there's no such thing as a 
non-unicode string, and there we made it explicit that it's just a 
way of manipulating *bytes*, not unicode.

ISTM that very few (if any) of the proposals floating around for 
modifying WSGI are taking this concept into account.  Most of them 
sound to me like people saying, "yeah, but this particular hack will 
work for *my* apps...  so everybody else must be doing something stupid."

But WSGI was built on the principle of *equally inconveniencing 
everyone*, specifically to avoid an impossible attempt at consensus 
between incompatible ways of doing things.  (E.g., nine million 
request/response APIs.)

So, if the only problem we're going to cause by using bytes 
everywhere is to make everyone need to change their routing code on 
Python 3, I vote +1000.  ;-)



More information about the Web-SIG mailing list