[Web-SIG] urllib.unquote in paste.httpserver prevents slashes in path segments

And Clover and-py at doxdesk.com
Thu Mar 17 22:02:04 CET 2011


On Thu, 2011-03-17 at 19:10 +0100, Florian Friesdorf wrote:
> I think paste.httpserver.WSGIHandlerMixin.wsgi_setup should not
> urllib.unquote the path before setting it in the wsgi environment

I'm afraid it must. This is something the WSGI specification inherits
from CGI.

Yes, it was a terrible decision to have SCRIPT_NAME and PATH_INFO
automatically unescaped, as it loses the distinction between ‘%2F’ and
‘/’, and has resulted in endless problems with non-ASCII characters that
could otherwise been handled perfectly well as %-sequences.

But that decision was taken a couple of decades ago and there's not
really much we can do about it now. CGI may be an anachronism, but it is
still widely used and its assumptions are still felt through Apache, IIS
and WSGI.

> By urllib.unquoting it is not possible to
> have urllib.quoted slashes within one path segment.

Correct. And neither Apache nor IIS allows %2F to be used within a path
segment either, so really if you want to write a portable web app you
simply have to avoid them (along with %00 and %5C). It is not currently
practical to include any arbitrary byte sequence in a URL path segment,
even though by the URL specification you should be able to.

It's annoying, it's inelegant, it's limiting. But none of our attempts
to extend or replace it for non-CGI-based servers (see past list
discussion on path-info-raw or standardising REQUEST_URI) have come to
any acceptable conclusion. We are stuck with it for the foreseeable.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com
gtalk:chat?jid=bobince at gmail.com



More information about the Web-SIG mailing list