[Web-SIG] URL quoting in WSGI (or the lack therof)
Robert Brewer
fumanchu at aminus.org
Sat Jan 19 20:13:36 CET 2008
Luis Bruno wrote:
> I'm using a /-delimited path, %-encoding each literal '/' appearing in
> the path segments. I was not amused to see egg:Paste#http urldecoding
> the whole PATH_INFO.
All HTTP URI are /-delimited, and any '/' appearing in a single segment
that is not intended to participate in the hierarchy semantics must be
%-encoded before transmitting it over HTTP. I think that's what you're
saying above, but I don't understand why decoding on the server or
gateway is a problem. Perhaps you could expand on that: when you say
"I'm using", where is that? Inside a WSGI application?
> Ben Bangert wrote:
> > This recently became an issue, when a user noticed that the %2B URL
> > encoding for a + sign, had turned into a space when it hit their
app.
>
> A swift monkey-patch to
paste.httpserver.py:WSGIHandlerMixin.wsgi_setup()
> later, and ORIGINAL_PATH_INFO is part of the WSGI spec in my world.
> The following URL now Does The Right Thing:
>
> http://127.0.0.1:5000/catalog/NEC/Computers/Laptops/LN500%2F9DW/
Platonic Capital Letters won't get you very far with this crowd. You
have to explain why you think the application should receive %XX encoded
URI's instead of decoded ones. What's the benefit? I only see a con:
every piece of middleware that cares has to repeat the decoding of
PATH_INFO and SCRIPT_NAME, wasting CPU and memory.
> Robert Brewer wrote:
> > I changed CP's wsgiserver to do decoding that very day.
> > So I think the answer is "yes".
>
> IMHO "yes" is the wrong answer
Why?
> I am also very unsure about what is the right answer.
According to [1], the right answer is "yes":
The PATH_INFO metavariable specifies a path to be interpreted
by the CGI script. It identifies the resource or sub-resource
to be returned by the CGI script, and it is derived from the
portion of the URI path following the script name but preceding
any query data. The syntax and semantics are similar to a
decoded HTTP URL 'path' token (defined in RFC 2396 [4]), with
the exception that a PATH_INFO of "/" represents a single void
path segment.
Robert Brewer
fumanchu at aminus.org
[1] http://cgi-spec.golux.com/draft-coar-cgi-v11-03-clean.html#6.1.6
More information about the Web-SIG
mailing list