[Web-SIG] Proposal to remove SCRIPT_NAME/PATH_INFO
Massimo Di Pierro
mdipierro at cs.depaul.edu
Wed Sep 23 05:05:49 CEST 2009
Hello Ian,
I really like your proposal.
Massimo
On Sep 22, 2009, at 9:22 PM, Ian Bicking wrote:
> OK, I mentioned this in the last thread, but... I can't keep up with
> all this discussion, and I bet you can't either.
>
> So, here's a rough proposal for WSGI and unicode:
>
> I propose we switch primarily to "native" strings: str on both
> Python 2 and 3.
>
> Specifically:
>
> environ keys: native
> environ CGI values: native
> wsgi.* (that is text): native
> response status: native
> response headers: native
>
> wsgi.input remains byte-oriented, as does the response app_iter.
>
> I then propose that we eliminate SCRIPT_NAME and PATH_INFO. Instead
> we have:
>
> wsgi.script_name
> wsgi.path_info (I'm not entirely set on these names)
>
> These both form the original path. It is not URL decoded, so it
> should be ASCII. (I believe non-ASCII could be rejected by the
> server, with Bad Request? A server could also choose to treat it as
> UTF8 or Latin1 and encode unsafe characters to make it ASCII) Thus
> to re-form the URL, you do:
>
> environ['wsgi.url_scheme'] + '://' + environ['HTTP_HOST'] +
> environ['wsgi.script_name'] + environ['wsgi.path_info'] + '?' +
> environ['QUERY_STRING']
>
> All incoming headers will be treated as Latin1. If an application
> suspects another encoding, it is up to the application to transcode
> the header into another encoding. The transcoded value should not
> be put into the environ. In most cases headers should be ASCII, and
> Latin1 is simply a fallback that allows all bytes to be represented
> in both Python 2 and 3.
>
> Similarly all outgoing headers will be Latin1. Thus if you (against
> good sense) decide to put UTF8 into a cookie, you can do:
>
> headers.append(('Set-Cookie',
> unicode_text.encode('UTF8').decode('latin1')))
>
> The server will then decode the text as latin1, sending the UTF8
> bytes. This is lame, but non-ASCII in headers is lame. It would be
> preferable to do:
>
> headers.append(('Set-Cookie',
> urllib.quote(unicode_text.encode('UTF8'))))
>
> This sends different text, but is highly preferable. If you wanted
> to parse a cookie that was set as UTF8, you'd do:
>
> parse_cookie(environ['HTTP_COOKIE'].encode('latin1').decode('utf8'))
>
> Again, it would be better to do;
>
> parse_cookie(urllib.unquote(environ['HTTP_COOKIE']).decode('utf8'))
>
> Other variables like environ['wsgi.url_scheme'],
> environ['CONTENT_TYPE'], etc, will be native strings. A Python 3
> hello work app will then look like:
>
> def hello_world(environ):
> return ('200 OK', [('Content-type', 'text/html; charset=utf8')],
> ['Hello World!'.encode('utf8')])
>
> start_response and changes to wsgi.input are incidental to what I'm
> proposing here (except that wsgi.input will be bytes); we can decide
> about themseparately.
>
>
>
> Outstanding issues:
>
> Well, the biggie: is it right to use native strings for the environ
> values, and response status/headers? Specifically, tricks like the
> latin1 transcoding won't work in Python 2, but will in Python 3. Is
> this weird? Or just something you have to think about when using
> the two Python versions?
>
> What happens if you give unicode text in the response headers that
> cannot be encoded as Latin1?
>
> Should some things specifically be ASCII? E.g., status.
>
> Should some things be unicode on Python 2?
>
> Is there a common case here that would be inefficient?
>
>
>
> --
> Ian Bicking | http://blog.ianbicking.org | http://topplabs.org/civichacker
> <ATT00001..txt>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090922/1ef951e8/attachment.htm>
More information about the Web-SIG
mailing list