[Web-SIG] URL quoting in WSGI (or the lack therof)

James Y Knight foom at fuhm.net
Tue Jan 22 17:02:22 CET 2008


On Jan 22, 2008, at 6:47 AM, Sven Berkvens-Matthijsse wrote:

> Luís Bruno wrote:
>> Ian Bicking wrote:
>>> But relating REQUEST_URI with SCRIPT_NAME/PATH_INFO is awkward and
>>> having the information in duplicate places can lead to errors and
>>> unclear situations if they don't match up properly.
>>
>> True, and you can apply the same reasoning to my suggestion too.
>>
>> Apart from the duplication of information, there's how or where to
>> do the actual decoding. Not everyone is dispatching to a
>> CherryPy-style tree of objects, so putting a %-decoded list of path
>> segments in a environ key doesn't work -- I knew it was a bad idea!
>> I'm going with CherryPy's on this: don't decode "%2F". Should other
>> characters be kept encoded?
>
> Yes, in my opinion all encoded character should remain encoded.
> Otherwise, a path like /whatever/some%252Fthing/blah/ would become
> (after decoding): /whatever/some%2Fthing/blah/ which is certainly not
> what you'd want and/or expect.

Your opinion is irrelevant, this is specified by the CGI spec. Yes,  
agreed, it's not the best spec ever, but there's nothing you can do  
about that. FWIW, I think the right thing for a server to do is to  
reject any URLs going to a wsgi (or cgi) script with a %2F in it. I  
believe this is what apache's CGI host does.

BTW, for extra fun, you should be considering ";" too.

James


More information about the Web-SIG mailing list