[Web-SIG] URL quoting in WSGI (or the lack therof)

Brian Smith brian at briansmith.org
Tue Jan 22 19:34:24 CET 2008


Luis Bruno wrote:
> Brian Smith wrote:
> > An ammendment that recommends, but does not require, 
> > REQUEST_URI is a much better option.
> 
> Thereby forcing me to shop around for a WSGI server that 
> actually puts the recommendation into practice? Because I 
> want to keep my %-encoded characters? Which I encoded for, 
> you know, escaping them from the usual processing? Smells of 
> mistake.

You already have to shop around for a WSGI server that can distinguish
between encoded and unencoded slashes in PATH_INFO, because the WSGI
specification doesn't require the WSGI gateway to distinguish between
them.

I agree that the WSGI 1.0 specification is not good in this regard.
However, because an application cannot detect whether PATH_INFO has been
decoded or not, the only reasonable thing that it can do is to assume
that the gateway and middleware are following the WSGI specification.
The corollary is that applications shouldn't rely on being able to
distinguish between "%2F" and "/" based on PATH_INFO if it wants to be
portable.

If you really want PATH_INFO to have "%2F" instead of "/", then I
suggest encoding the slashes as "%252F" or "$2F" or something else. Then
your application will be portable.

> This sub-thread starts with me putting an ORIGINAL_PATH_INFO 
> into the environ, which the dispatch code doesn't touch. This 
> forces me to strip the app mount points, reinventing 
> Paste#urlmap. Should REQUEST_URI be touched by dispatch code? 
> If so, PATH_INFO has no use. If not, the duplication Ian 
> Bicking mentioned comes into play.

By definition, the Request URI doesn't change during a request. So,
REQUEST_URI shouldn't fiddled with by dispatching code, unlike
SCRIPT_NAME and PATH_INFO. Usually, the dispatching code is just
shifting segments of PATH_INFO into SCRIPT_NAME, but SCRIPT_NAME joined
with PATH_INFO and the QUERY_STRING is always constant. So, the problems
with ORIGINAl_PATH_INFO don't apply to REQUEST_URI.

> > That version of the CGI specification clearly expects 
> > PATH_INFO to be decoded.
> 
> I agree; I think you should refer to the top of page 14 in 
> RFC 3875, instead of to the 1999 draft. The draft didn't 
> outright forbid multiple path-segments like the RFC does, but 
> was ambiguous enough (your quote):

PEP 333 defers the definition of PATH_INFO to the 1999 draft, not to RFC
3875. So, it doesn't matter what RFC 3875 says.

> Fortunately, the URI spec doesn't repeat the mistake of 
> forbidding %-encoding characters. It does mention that each 
> path-segment should be separately %-decoded, going against 
> the CGI spec which actually forbids multiple segments *in 
> PATH_INFO*. That smells of mistake. Faced with the choice 
> between those specs, I'd prefer not to lose information for 
> mindless compliance with CGI.

I don't care about CGI compatibility. I do depend on WSGI gateways being
compliant with the WSGI specification. 

- Brian



More information about the Web-SIG mailing list