[Web-SIG] WSGI 2: Decoding the Request-URI

Mon Aug 17 16:37:43 CEST 2009

I wrote:
> Applications do produce URI's (and IRI's, etc. that need to be
> converted into URI's) and do transfer them in media types like
> HTML, which define how to encode a.href's and form.action's
> before %-encoding them [4]. But these are not the only vectors
> by which clients obtain or generate Request-URI's.
> ...
> As someone (Alan Kennedy?) noted at PyCon, static resources may
> depend upon a filename encoding defined by the OS which is
> different than that of the rest of the URI's generated/understood
> by even the most coherent application.
> ...
> "In practical terms, character-by-character comparisons should be
> done codepoint-by-codepoint after conversion to a common character
> encoding." In other words, the URI spec seems to imply that the
> two URI's "/a%c3%bf" and "/a%ff" may be equivalent, if the former
> is u"/a\u00FF" encoded in UTF-8 and the latter is u"/a\u00FF"
> encoded in ISO-8859-1. Note that WSGI 1.0 cannot speak about
> this, since all environ values must be byte strings. IMO WSGI
> 2 should do better in this regard.
> ...
> For the three reasons above, I don't think we can assume that the
> application will always receive equivalent URI's encoded in a
> single, foreseen encoding.

Did I say 3 reasons? I meant 4: Accept-Charset.

Robert Brewer
fumanchu at aminus.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20090817/d0ea92cf/attachment.htm>