[Web-SIG] URL quoting in WSGI (or the lack therof)

Robert Brewer fumanchu at aminus.org
Sat Jan 19 20:13:36 CET 2008


Luis Bruno wrote:
> I'm using a /-delimited path, %-encoding each literal '/' appearing in
> the path segments. I was not amused to see egg:Paste#http urldecoding
> the whole PATH_INFO.

All HTTP URI are /-delimited, and any '/' appearing in a single segment
that is not intended to participate in the hierarchy semantics must be
%-encoded before transmitting it over HTTP. I think that's what you're
saying above, but I don't understand why decoding on the server or
gateway is a problem. Perhaps you could expand on that: when you say
"I'm using", where is that? Inside a WSGI application?

> Ben Bangert wrote:
> > This recently became an issue, when a user noticed that the %2B URL
> > encoding for a + sign, had turned into a space when it hit their
app.
> 
> A swift monkey-patch to
paste.httpserver.py:WSGIHandlerMixin.wsgi_setup()
> later, and ORIGINAL_PATH_INFO is part of the WSGI spec in my world.
> The following URL now Does The Right Thing:
> 
> http://127.0.0.1:5000/catalog/NEC/Computers/Laptops/LN500%2F9DW/

Platonic Capital Letters won't get you very far with this crowd. You
have to explain why you think the application should receive %XX encoded
URI's instead of decoded ones. What's the benefit? I only see a con:
every piece of middleware that cares has to repeat the decoding of
PATH_INFO and SCRIPT_NAME, wasting CPU and memory.

> Robert Brewer wrote:
> > I changed CP's wsgiserver to do decoding that very day.
> > So I think the answer is "yes".
> 
> IMHO "yes" is the wrong answer

Why?

> I am also very unsure about what is the right answer.

According to [1], the right answer is "yes":

    The PATH_INFO metavariable specifies a path to be interpreted
    by the CGI script. It identifies the resource or sub-resource
    to be returned by the CGI script, and it is derived from the
    portion of the URI path following the script name but preceding
    any query data. The syntax and semantics are similar to a
    decoded HTTP URL 'path' token (defined in RFC 2396 [4]), with
    the exception that a PATH_INFO of "/" represents a single void
    path segment.


Robert Brewer
fumanchu at aminus.org

[1] http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.6



More information about the Web-SIG mailing list