[Web-SIG] URL quoting in WSGI (or the lack therof)

Luis Bruno lbruno at 100blossoms.com
Tue Jan 22 19:02:19 CET 2008


Brian Smith wrote:
> An ammendment that recommends, but does not require, REQUEST_URI is a 
> much better option.

Thereby forcing me to shop around for a WSGI server that actually puts 
the recommendation into practice? Because I want to keep my %-encoded 
characters? Which I encoded for, you know, escaping them from the usual 
processing? Smells of mistake.

This sub-thread starts with me putting an ORIGINAL_PATH_INFO into the 
environ, which the dispatch code doesn't touch. This forces me to strip 
the app mount points, reinventing Paste#urlmap. Should REQUEST_URI be 
touched by dispatch code? If so, PATH_INFO has no use. If not, the 
duplication Ian Bicking mentioned comes into play.

> That version of the CGI specification clearly expects PATH_INFO to be decoded.

I agree; I think you should refer to the top of page 14 in RFC 3875, 
instead of to the 1999 draft. The draft didn't outright forbid multiple 
path-segments like the RFC does, but was ambiguous enough (your quote):

> Section 6.1.6 is more explicit, saying: "The syntax and semantics are
> similar to a decoded HTTP URL 'path' token (defined in RFC 2396 [4])
>   

Don't forget to read the %-decoding rules in RFC 2396's section 2.4.2 if 
you're going to quote "decoded HTTP URL 'path' token".

Fortunately, the URI spec doesn't repeat the mistake of forbidding 
%-encoding characters. It does mention that each path-segment should be 
separately %-decoded, going against the CGI spec which actually forbids 
multiple segments *in PATH_INFO*. That smells of mistake. Faced with the 
choice between those specs, I'd prefer not to lose information for 
mindless compliance with CGI.


-- 
Luís Bruno


More information about the Web-SIG mailing list