[Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO

Wed Sep 23 05:05:49 CEST 2009

Hello Ian,

I really like your proposal.

Massimo

On Sep 22, 2009, at 9:22 PM, Ian Bicking wrote:

> OK, I mentioned this in the last thread, but... I can't keep up with  
> all this discussion, and I bet you can't either.
>
> So, here's a rough proposal for WSGI and unicode:
>
> I propose we switch primarily to "native" strings: str on both  
> Python 2 and 3.
>
> Specifically:
>
> environ keys: native
> environ CGI values: native
> wsgi.* (that is text): native
> response status: native
> response headers: native
>
> wsgi.input remains byte-oriented, as does the response app_iter.
>
> I then propose that we eliminate SCRIPT_NAME and PATH_INFO.  Instead  
> we have:
>
> wsgi.script_name
> wsgi.path_info (I'm not entirely set on these names)
>
> These both form the original path.  It is not URL decoded, so it  
> should be ASCII.  (I believe non-ASCII could be rejected by the  
> server, with Bad Request?  A server could also choose to treat it as  
> UTF8 or Latin1 and encode unsafe characters to make it ASCII)  Thus  
> to re-form the URL, you do:
>
> environ['wsgi.url_scheme'] + '://' + environ['HTTP_HOST'] +  
> environ['wsgi.script_name'] + environ['wsgi.path_info'] + '?' +  
> environ['QUERY_STRING']
>
> All incoming headers will be treated as Latin1.  If an application  
> suspects another encoding, it is up to the application to transcode  
> the header into another encoding.  The transcoded value should not  
> be put into the environ.  In most cases headers should be ASCII, and  
> Latin1 is simply a fallback that allows all bytes to be represented  
> in both Python 2 and 3.
>
> Similarly all outgoing headers will be Latin1.  Thus if you (against  
> good sense) decide to put UTF8 into a cookie, you can do:
>
> headers.append(('Set-Cookie',  
> unicode_text.encode('UTF8').decode('latin1')))
>
> The server will then decode the text as latin1, sending the UTF8  
> bytes.  This is lame, but non-ASCII in headers is lame.  It would be  
> preferable to do:
>
> headers.append(('Set-Cookie',  
> urllib.quote(unicode_text.encode('UTF8'))))
>
> This sends different text, but is highly preferable.  If you wanted  
> to parse a cookie that was set as UTF8, you'd do:
>
> parse_cookie(environ['HTTP_COOKIE'].encode('latin1').decode('utf8'))
>
> Again, it would be better to do;
>
> parse_cookie(urllib.unquote(environ['HTTP_COOKIE']).decode('utf8'))
>
> Other variables like environ['wsgi.url_scheme'],  
> environ['CONTENT_TYPE'], etc, will be native strings.  A Python 3  
> hello work app will then look like:
>
> def hello_world(environ):
>     return ('200 OK', [('Content-type', 'text/html; charset=utf8')],  
> ['Hello World!'.encode('utf8')])
>
> start_response and changes to wsgi.input are incidental to what I'm  
> proposing here (except that wsgi.input will be bytes); we can decide  
> about themseparately.
>
>
>
> Outstanding issues:
>
> Well, the biggie: is it right to use native strings for the environ  
> values, and response status/headers?  Specifically, tricks like the  
> latin1 transcoding won't work in Python 2, but will in Python 3.  Is  
> this weird?  Or just something you have to think about when using  
> the two Python versions?
>
> What happens if you give unicode text in the response headers that  
> cannot be encoded as Latin1?
>
> Should some things specifically be ASCII?  E.g., status.
>
> Should some things be unicode on Python 2?
>
> Is there a common case here that would be inefficient?
>
>
>
> -- 
> Ian Bicking  |  http://blog.ianbicking.org  |  http://topplabs.org/civichacker
> <ATT00001..txt>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090922/1ef951e8/attachment.htm>