[docs] [issue16679] Add advice about non-ASCII wsgiref PATH_INFO
Andrew Clover
report at bugs.python.org
Thu Apr 21 13:55:05 EDT 2016
Andrew Clover added the comment:
> Why only PATH_INFO is encoded in such a manner, but QUERY_STRING is passed without any changes and does not requires any latin-1 to utf-8 recodings?
Laziness: QUERY_STRING should be pure-ASCII, making any such transcoding a no-op.
In principle a user agent *can* submit non-ASCII characters in a query string without %-encoding them, but it's not standards-conformant and most browsers don't usually do it (exception: apparently curl as above), so it's not worth adding a layer of hopefully-fixing-but-potentially-mangling to this variable to support a situation that shouldn't arise for normal requests.
PATH_INFO only requires special handling because of the sad, sad historical artefact of the CGI spec requiring it to have URL-decoding applied to it at the gateway, thus making the non-ASCII characters pop out of the percentage woodwork.
@Graham can you share more about how those test results were generated and displayed? The Gunicorn results are about what I would expect - the double-decoding of PATH_INFO is arguably undesirable when curl submits raw bytes, but ultimately that's an unspecified situation so I don't really case.
The output from Apache, on the other hand, is odd - something appears to have mangled the results at the reporting stage as not only is there double-decoding but also some double-backslashes. It looks like the strings have been put through ascii(repr()) or something?
----------
nosy: +Andrew Clover
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16679>
_______________________________________
More information about the docs
mailing list