[Web-SIG] WSGI Amendments thoughts: the horror of charsets
Andrew Clover
and-py at doxdesk.com
Fri Nov 14 22:23:35 CET 2008
Ian Bicking wrote:
> This is something messed up with CGI on NT, and whatever server you are
> using, and perhaps the CGI adapter (maybe there's a way to get the raw
> environment without any encoding, for example?)
Python decodes the environ to its own copy (wrapped in os.environ) at
interpreter startup time; there's no way to query the real ‘live’
environment that I know of. It'd require a C extension.
> Honestly I don't know if anyone is doing anything with
> WSGI and Python 3.
I know Graham has done some work on mod_wsgi for 3.0, but no, I don't
know anyone using it in anger.
Is it worth submitting patches to simple_server to make it run on 3.0?
Is it too late to include at this stage anyway? Shipping 3.0 with a
non-functional wsgiref is a bit embarrassing.
> I assume there is some way to get at the bytes in the environment, if not
> then that is a Python 3 bug.
There is not, and this appears to be deliberate.
> I think it might be feasible to support an encoded version of
> SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names,
> and I don't know of any particular standard to base those names on),
> moving from the two keys to a single REQUEST_URI is not feasible.
That's certainly a possibility, but I feel it's easier to hitch a ride
on the existing header, which despite being non-standard is still quite
widely used.
> I guess you'd probably count segments, try to catch %2f (where the
> segments won't match up), and then double check that the decoded
> REQUEST_URI matches SCRIPT_NAME+PATH_INFO.
I'm currently testing with just the segment counting. It's only
necessary that the segments from SCRIPT_NAME are matched and stripped,
and those are extremely unlikely to contain ‘%2F’ because:
- there aren't many filesystems that can accept ‘/’ as a filename
character. RISC OS is the only one I can think of, and it by
convention swaps ‘/’ and ‘.’ to compensate as it is, so even
there you couldn't use ‘%2F’;
- there aren't many webservers that can map a file or alias to a
path containing ‘%2F’;
- no-one wants to mount a webapp alias at such a weird name — it's
only in the section corresponding to PATH_INFO that ‘%2F’ might
ever be of use in practice.
In the worst case, many applications already know and can strip the URL
at which they're mounted, but unless there's a legitimate ‘%2F’ in their
SCRIPT_NAME it doesn't actually matter.
> frankly IIS is probably less relevant to most developers than CGI.
Er... really?
You and I may not favour it, but it's ≈35% of the world out there, not
something we can afford to ignore IMO.
> So if IIS has problems with PATH_INFO, the WSGI adapter
> (be it CGI or otherwise) should be configured to fix those problems up
> front.
What I'm saying is that neither Apache's nor IIS's behaviour can be
considered clearly correct or wrong at this point, and there is no way a
WSGI adapter living underneath them *can* fix up the differences.
(There is an problem with PATH_INFO that a WSGI adapter *could* clear
up, which is that IIS makes PATH_INFO the entire path including
SCRIPT_NAME. I'm not sure whether it's worth fixing that up in the
adapter layer though... it's possible some frameworks are already
dealing with it, and might even be relying on it!)
--
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
More information about the Web-SIG
mailing list