[Web-SIG] URL quoting in WSGI (or the lack therof)

Brian Smith brian at briansmith.org
Tue Jan 22 17:44:43 CET 2008


Luis Bruno wrote:
> Ian Bicking wrote:
> > But relating REQUEST_URI with SCRIPT_NAME/PATH_INFO is awkward and 
> > having the information in duplicate places can lead to errors and 
> > unclear situations if they don't match up properly.

I don't understand this argument. WSGI gateways just need to parse the
request URL correctly, and then everything *will* match up correctly,
AFAICT. Providing an undecoded REQUEST_URI that an application can parse
on its own is much better than what CherryPy is doing, and it is useful
for other reasons as well.

> I'm going with CherryPy's on this: don't decode "%2F".

CherryPy is not implementing the WSGI 1.0 specification correctly. And,
CherryPy's behavior here is harmful, because applications have no way of
knowing whether "%2F" is an un-decoded slash, or a literal "%2F".

> > Luis Bruno wrote:
> >> I was not amused to see egg:Paste#http urldecoding the 
> >> whole PATH_INFO.
> > Unfortunately this is in the WSGI spec, so it's not 
> > Paste#http so much as WSGI that demands this.
> 
> I skimmed PEP 333 before grumbling and I've just re-read it; 
> didn't find it, unless you're referring to the code in "URL 
> Reconstruction" section. 
> If you're referring[*] to the CGI 1.1 draft linked in "environ 
> Variables", I think it supports my position that unquoting(PATH_INFO) 
> was not the correct thing to do.

PEP 333 defers the definition of PATH_INFO to the CGI specification:
"The environ dictionary is required to contain these CGI environment
variables, as defined by the Common Gateway Interface specification
[2]". That version of the CGI specification clearly expects PATH_INFO be
to decoded. Section 3.2 says "'enc-path-info' is a URL-encoded version
of PATH_INFO". The implication is that PATH_INFO is *not* URL-encoded.
Section 6.1.6 is more explicit, saying: "The syntax and semantics are
similar to a decoded HTTP URL 'path' token (defined in RFC 2396 [4]),
with the exception that a PATH_INFO of "/" represents a single void path
segment." 

Furthermore, the URL reconstruction section and the CGI WSGI gateway
both also imply that PATH_INFO has already been decoded.

> > [/Laptops/LN500%2F9DW/ ] would be the Right Thing, except for not 
> > being WSGI.
> Looks to me like a good candidate for an amendment.
> 
> What's the next step?

Something so fundemantal as this cannot be changed with a simple
ammendment to the existing specification. Such a change would break
currently-conforming gateways and applications. An ammendment that
recommends, but does not require, REQUEST_URI is a much better option.

- Brian



More information about the Web-SIG mailing list