[Web-SIG] PEP 444 (aka Web3)

And Clover and-py at doxdesk.com
Fri Sep 17 11:40:28 CEST 2010


On 09/17/2010 04:21 AM, Ian Bicking wrote:

> Yes, if we get rid of SCRIPT_NAME/PATH_INFO then the problem goes away.  For
> servers without access to the unencoded value, reencoding those values
> doesn't actually lose any information over what we have now, and avoids any
> encoding issues.

It doesn't lose any information, but it also makes script_name/path_info 
inherently unreliable. My fear is that if gateways are allowed to create 
a reconstructed script_name/path_info without clearly signalling they 
have done so, those values will continue to be unreliable at all times 
and server authors won't feel the need to get it right since it's broken 
everywhere anyway: the unhappy status quo.

This is why I am continuing to plead for a 'script_name/path_info are 
authoritative' flag in environ that applications can use to detect 
situations where it is safe to go ahead and rely on them. I want to say 
"Unicode paths are supported if your server/gateway does", not "Unicode 
paths might sometimes work, depending on how you configure your server 
and application".

It is not just CGI that is affected here! IIS does not provide the 
original undecoded path at all, even through ISAPI.

At the moment I am using a 'fixPathInfo' method in my form-reading layer 
to try to compensate as much as possible for the problems of CGI:

   - on Python 2 on Windows, re-read the environment variables using
     ctypes if available, to avoid the mangling caused by reading
     os.environ using mbcs. (This didn't used to work, as old versions
     of IIS deliberately mbcs-filtered values before putting them in the
     environment, but it does now.)

   - on Python 3 on POSIX, re-read the environment variables using
     environb if available. Otherwise try to reverse the faulty decoding
     of environ using surrogateescapes, where available.

   - on Windows, encode the Unicode environment to bytes using
     ISO-8859-1 if the server is Apache, or UTF-8 is the server is
     IIS. (IIS tries to decode path bytes using UTF-8, falling back
     to mbcs where the input is not valid UTF-8. Unfortunately there
     is no way to tell this has happened.)

   - when server is Microsoft-IIS, remove the erroneously repeated
     SCRIPT_NAME components from the front of PATH_INFO. (This is a
     long-standing bug that can be configured away using the
     allowPathInfo/AllowPathInfoForScriptMappings configs, but no-
     one does as it breaks ASP.)

However, the form layer is not really the right place to be doing these 
hacks. It would be better done in the stdlib CGI handler.

> Servers with REQUEST_URI can at least attempt to
> reconstruct the encoded values.

This is slightly unsafe. It's something an application might want to do 
(or at least provide as an option), but a gateway probably couldn't get 
away with it for the general case because REQUEST_URI doesn't reflect 
the redirections done by a RewriteRule or an ErrorDocument.

> Cookie is also the one header that can't be safely folded.

There are others, eg. Authorization. Anyway: folding doesn't happen in 
the HTTP world. It can be forgotten about.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


More information about the Web-SIG mailing list