[Web-SIG] PEP 444 (aka Web3)

And Clover and-py at doxdesk.com
Fri Sep 17 15:43:21 CEST 2010


On 09/17/2010 02:03 PM, Armin Ronacher wrote:

> In case we change the spec as Ian mentioned above, I am all for
> a "wsgi.guessed_encoding" = True flag or something like that.

Yes, I'd like to see that. I believe going with *only* a 
raw-or-reconstructed path_info, rather than having both path_info and 
PATH_INFO, is probably best, for the middleware-dupication reasons PJE 
mentioned.

A more in-depth possibility might be:

wsgi.path_accuracy =

     0: script_name/path_info have been crudely reconstructed from
     SCRIPT_NAME/PATH_INFO from an unknown source. Beware!
     If there is to be backwards compatibility with WSGI1, this
     would be seen as the 'default value' given a missing path_accuracy.

     1: script_name/path_info have been reconstructed, but it is known
     that path_info is accurate, other than %2F and non-ASCII issues.
     That is, it's known that the path doesn't come from IIS's broken
     PATH_INFO, or the IIS error has been detected and compensated for.

     2: script_name/path_info have been reconstructed using known-good
     encodings for the env. The only way in which they may differ from
     the original request path is that a slash might originally have
     been a %2F. (This is good enough for the vast majority of
     applications.)

     3: script_name/path_info come directly from the request path
     without any intervening mangling.

> Unless I am mistaken, the same is true for CGI scripts running on
> Apache2 on Windows.

Yes, it's true of *all* CGI scripts, but also for non-CGI scripts on IIS.

> I did some tests a while ago and was pretty sure that Apache2 on Windows
> did the same.

Apache-on-Windows puts the bytes of the decoded path into the 
environment variables as one code unit per byte: that is, as if encoded 
by ISO-8859-1. You still have to read the environ using ctypes because 
mbcs is never ISO-8859-1, but at least the original bytes are 
recoverable, which isn't the case with IIS.

> The correct place for these hacks would be the appropriate WSGI/Web3
> handler of the webserver.

The IIS PATH_INFO-prefix hack would be appropriate to put in an 
IIS-specific handler; indeed, I believe isapi_wsgi does just that. But 
the other hacks are specific to CGI.

For CGI, there is no 'handler of the webserver', there is only the 
standard CGI-to-WSGI adapter, so this is the only component it is 
reasonable to burden with the hacks. Frameworks and libraries further up 
the stack cannot reliably do the fixups, because they don't know whether 
the WSGI environ they have been given comes from os.environ or somewhere 
else, or whether middleware has played with it.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


More information about the Web-SIG mailing list