[Web-SIG] String Types in WSGI [Graham's WSGI for py3]

Fri Sep 18 10:12:38 CEST 2009

On Fri, Sep 18, 2009 at 8:56 AM, Graham Dumpleton
<graham.dumpleton at gmail.com> wrote:
>> The big problems are always PATH_INFO and SCRIPT_NAME.  Those are the
>> only values that are in the dict URL-decoded and might contain non-ASCII
>> characters. (except for headers, but that's a different story because
>> the only real-world problem there are cookie headers and those are
>> troubleing for more reasons than just character sets)
>>
>> My latest change to the WSGI sandbox hg repo [2] was that I added a
>> notice that later PEP revisions might document a RAW_SCRIPT_NAME or
>> something that contains the URL quoted values.  It however turns out
>> that this value is not available from within a webserver context (We're
>> talking about Apache and IIS here) so that the problem of unquoted
>> values will not go away.
>
> I am still waiting for the good explanation of why access to the raw
> URL quoted values is so important. Can you please explain what the
> requirement is?
>
> The only example I recall was related to web servers eliminating
> repeating slashes thereby effectively not making it possible to have
> URLs in query strings with out a custom encoding string. Since there
> are alternatives, I don't find that alone a compelling argument.
>

Why is the raw url needed(very rarely)?

Sometimes there are bugs.  Access to the raw string lets you work
around those bugs... if you need to.  Dropping to a lower level is
needed sometimes.

Some APIs require you to send back an exact copy of the input url.  Or
sometimes you want to know what input url was used... not the cleaned
up version of it.  Sometimes clients calling the wsgi code will be
buggy... and looking at the unquoted url is needed in those cases to
work around buggy clients.