[Web-SIG] URL quoting in WSGI (or the lack therof)
Luis Bruno
lbruno at 100blossoms.com
Tue Jan 22 19:02:19 CET 2008
Brian Smith wrote:
> An ammendment that recommends, but does not require, REQUEST_URI is a
> much better option.
Thereby forcing me to shop around for a WSGI server that actually puts
the recommendation into practice? Because I want to keep my %-encoded
characters? Which I encoded for, you know, escaping them from the usual
processing? Smells of mistake.
This sub-thread starts with me putting an ORIGINAL_PATH_INFO into the
environ, which the dispatch code doesn't touch. This forces me to strip
the app mount points, reinventing Paste#urlmap. Should REQUEST_URI be
touched by dispatch code? If so, PATH_INFO has no use. If not, the
duplication Ian Bicking mentioned comes into play.
> That version of the CGI specification clearly expects PATH_INFO to be decoded.
I agree; I think you should refer to the top of page 14 in RFC 3875,
instead of to the 1999 draft. The draft didn't outright forbid multiple
path-segments like the RFC does, but was ambiguous enough (your quote):
> Section 6.1.6 is more explicit, saying: "The syntax and semantics are
> similar to a decoded HTTP URL 'path' token (defined in RFC 2396 [4])
>
Don't forget to read the %-decoding rules in RFC 2396's section 2.4.2 if
you're going to quote "decoded HTTP URL 'path' token".
Fortunately, the URI spec doesn't repeat the mistake of forbidding
%-encoding characters. It does mention that each path-segment should be
separately %-decoded, going against the CGI spec which actually forbids
multiple segments *in PATH_INFO*. That smells of mistake. Faced with the
choice between those specs, I'd prefer not to lose information for
mindless compliance with CGI.
--
Luís Bruno
More information about the Web-SIG
mailing list