[Web-SIG] WSGI for Python 3

Sat Jul 17 06:28:53 CEST 2010

On Saturday, July 17, 2010, Ian Bicking <ianb at colorstudy.com> wrote:
> On Fri, Jul 16, 2010 at 6:20 PM, Chris McDonough <chrism at plope.com> wrote:
>
>
>
>> What are the concrete problems you envision with text request headers,
>> text (URL-quoted) path, and text response status and headers?
>
> Documentation is the main reason.  For example, the documentation for
> making sense of path_info segments in a WSGI that used unicodey-strings
> would, as I understand it, read something like this:
>
> Nah, not nearly that hard:
>
> path_info = urllib.parse.unquote_to_bytes(environ['wsgi.raw_path_info']).decode('UTF-8')
>
> I don't see the problem?  If you want to distinguish %2f from /, then you'll do it slightly differently, like:
>
> path_parts = [
>     urllib.parse.unquote_to_bytes(p).decode('UTF-8')
>     for p in environ['wsgi.raw_path_info'].split('/')]
>
> This second recipe is impossible to do currently with WSGI.
> So... before jumping to conclusions, what's the hard part with using

Sorry, it is not that simple. The thing that everyone is ignoring is
that SCRIPT_NAME and PATH_INFO are also normalized by the web server
normally. That is, .. instances are removed. By passing the raw URL
through to the application, you are now forcing every application to
have to deal with that as well with the possibility of directory
traversal attacks when people get it wrong and the URL is mapping
somehow to file system resources. It is a huge can of worms which at
the moment the web server deals with.

I have other issues with the raw stuff, but haven't got to read the
last dozen messages in this discussion as yet, so will leave those
points to another time.

Graham