[Web-SIG] Python 3.0 and WSGI 1.0.

Robert Brewer fumanchu at aminus.org
Thu Apr 2 01:22:03 CEST 2009


Graham Dumpleton wrote:
> 2009/4/2 Robert Brewer <fumanchu at aminus.org>:
> > Alan Kennedy wrote:
> >> Hi Graham,
> >>
> >> I think yours is a good solution to the problem.
> >>
> >> [Graham]
> >> > In other words, leave all the existing CGI variables to come
> through
> >> > as latin-1 decode
> >>
> >> As latin-1 or rfc-2047 decoded, to unicode.
> >>
> >> > and do anything new in 'wsgi' variable namespace,
> >>
> >> So the server provides
> >>
> >> "wsgi.server_decoded_SCRIPT_NAME" == u"whatever"
> >> "wsgi.server_decoded_PATH_INFO" == u"whatever"
> >> "wsgi.server_decode_charset" == u"utf-8"
> >
> > I think everyone at the sprint today acquiesced to having
> > SCRIPT_NAME/PATH_INFO/QUERY_STRING be set in the environ as unicode.
> The
> > server can decide (probably subject to configuration). I've
> implemented
> > this in the python3 branch of CherryPy and it seems to work
> brilliantly.
> > Assuming the server *is* configurable, deployers should be able to
> > choose Latin-1 if they need to recover the original bytes, without
> > having to support a separate set of encoded-byte entries.
> 
> Seems to me that you can't have it be configurable and it must always
> be latin-1 interpretation. The problem is where you are composing
> multiple WSGI applications. If they each have different expectations
> or requirements as to how it is handled, aren't you going to have a
> problem. Or am I missing something in the way you are explaining it?

I would not expect multiple middlewares to want to decode the same URI
differently. But I would assume you'd run into problems when multiple
URI's in the same site had different encodings. Mark Ramm gave the use
case of exposing Unix filenames-as-bytes in URL's--the encoding is
unknown but a human may know better.

Allowing/forcing the human to stick that information in the app or in
the server is the same work, IMO. A server could be configurable to the
point of using different encodings for different URI's via regex
matching or <Location> sections or some other means. I'd be happy with a
spec that said, "servers MUST always decode these 3 entries, but SHOULD
allow the encoding used to be configurable." I'd be equally happy with a
spec that said, "servers MUST always decode these 3 as Latin-1" and
explain why. Both have their manageable pros and cons. But delaying the
decoding to the app by setting those 3 entries as bytes has more cons
than pros.


Robert Brewer
fumanchu at aminus.org



More information about the Web-SIG mailing list