[Web-SIG] URL quoting in WSGI (or the lack therof)

Ian Bicking ianb at colorstudy.com
Mon Jan 21 02:30:20 CET 2008


Luis Bruno wrote:
> Hello y'all, delurking,
> 
> I'm using a /-delimited path, %-encoding each literal '/' appearing in 
> the path segments. I was not amused to see egg:Paste#http urldecoding 
> the whole PATH_INFO.

Unfortunately this is in the WSGI spec, so it's not Paste#http so much 
as WSGI that demands this.

I think in the CGI implementations this is kind of handled by 
REQUEST_URI containing the quoted value.  But relating REQUEST_URI with 
SCRIPT_NAME/PATH_INFO is awkward and having the information in duplicate 
places can lead to errors and unclear situations if they don't match up 
properly.

> Ben Bangert wrote:
>> This recently became an issue, when a user noticed that the %2B URL 
>> encoding for a + sign, had turned into a space when it hit their app.
> A swift monkey-patch to 
> paste.httpserver.py:WSGIHandlerMixin.wsgi_setup() later, and 
> ORIGINAL_PATH_INFO is part of the WSGI spec in my world. The following 
> URL now Does The Right Thing:
> 
> http://127.0.0.1:5000/catalog/NEC/Computers/Laptops/LN500%2F9DW/

It would be the Right Thing, except for not being WSGI.  I made note of 
this issue on the WSGI 2.0 ideas page, but I don't think anyone 
(including myself) has proposed any good resolution.  Diverging from CGI 
and leaving PATH_INFO/SCRIPT_NAME quoted would work.  But it's libel to 
lead to bugs as it's a fairly subtle thing and for most applications the 
semantics won't change and people won't realize their code is broken for 
some corner case.  I suppose we could remove SCRIPT_NAME and PATH_INFO 
entirely and replace them with new keys.

   Ian




More information about the Web-SIG mailing list