[Web-SIG] WSGI 2

P.J. Eby pje at telecommunity.com
Tue Aug 4 17:46:14 CEST 2009


At 02:28 PM 8/4/2009 +1000, Graham Dumpleton wrote:
>2009/8/4 P.J. Eby <pje at telecommunity.com>:
> > I'm not clear on your logic here.  If I request foo/bar/baz (where baz
> > actually has an accent over the 'a') in latin-1 encoding, and 
> foo/bar is the
> > script, then the (accented) baz is legitimate for pass-through to the
> > application, no?
>
>Technically, but what I am pointing out is that Apache pretty well
>says that foo/bar needs to be UTF-8.

Which doesn't change the fact that you haven't yet proposed what a 
WSGI server should *do* with such non-UTF8 bytes in PATH_INFO and 
QUERY_STRING.  Apache can and does pass through such bytes, so the 
spec needs to say what we do with them.


>  If you are going to have
>different parts of the one URL needing a different encoding to be
>understood, personally I would say you asking for trouble. So, am
>saying that UTF-8 needs to really apply more for sake of sanity and
>portability.

So what, precisely, are you proposing should happen when such bytes 
are present?


>So I guess the problem is more where URLs are already % encoded when
>coming back as href or form action because they may be in an encoding
>incompatible with UTF-8 if it were to be clicked on.

Yep, that's the case with "standard" browsers and servers; 
less-standard situations such as spiders and scripts generating or 
following URLs are also relevant, as are deliberate hack 
attempts.  So having the result of this behavior be undefined is a bad thing.


>The Apache server at least will decode those % escape sequence and I
>believe it is the result of that which is used in stuff like rewrite
>rule matches, not the raw URL. The only exception would be if rewrite
>rule explicit matched against REQUEST_URI variable which still
>contains % escape sequences. So if not in UTF-8, means effectively
>that you can't then match them with Apache rewrite rules then.

That's got nothing to do with what you propose for WSGI to do with 
the rest of it, though.

(However, your belief may be incorrect in any event, as this page:

    http://www.dracos.co.uk/code/apache-rewrite-problem/

claims that mod_rewrite can RewriteCond on THE_REQUEST in order to 
match still-encoded paths.)



More information about the Web-SIG mailing list