[Web-SIG] Clarifications on Python 3.0 and WSGI.

Robert Brewer fumanchu at aminus.org
Tue Mar 25 17:58:24 CET 2008


Graham Dumpleton wrote:
> 3. When running under Python 3, servers MUST provide CGI HTTP
> variables as strings, decoded from the headers using HTTP standard
> encodings (i.e. latin-1 + RFC 2047)
> 
> Can someone give a practical example of where RFC 2047 fits into this
> and how one is meant to handle it?

Sure. According to RFC 2616 sec 2.2:

   Words of *TEXT MAY contain characters from character sets other than
   ISO-8859-1 only when encoded according to the rules of RFC 2047.

>From CP's test suite [1]:

    def ifmatch(self):
        val = cherrypy.request.headers['If-Match']
        cherrypy.response.headers['ETag'] = val
        return repr(val)
    
    ...
    
    # Test RFC-2047-encoded request and response header values
    c = "=E2=84=ABngstr=C3=B6m"
    self.getPage("/headers/ifmatch", [('If-Match', '=?utf-8?q?%s?=' %
c)])
    self.assertBody("u'\\u212bngstr\\xf6m'")
    self.assertHeader("ETag", '=?utf-8?b?4oSrbmdzdHLDtm0=?=')

That is, CherryPy-the-app-framework decodes the request header
'If-Match' from '=?utf-8?q? =E2=84=ABngstr=C3=B6m?=' to
u'\\u212bngstr\\xf6m'. See [2] for where that happens. PEP 333 only
talks about 2047 encoding, not decoding, and also says "All
encoding/decoding must be handled by the application", so we made the CP
WSGI server pass 2047-encoded request headers through unmodified.

FYI, there's been a lot of talk lately on the http-bis WG about using
some mechanism other than RFC 2047 in the future.


Robert Brewer
fumanchu at aminus.org

[1]
http://www.cherrypy.org/browser/trunk/cherrypy/test/test_core.py#L867
[2] http://www.cherrypy.org/browser/trunk/cherrypy/_cprequest.py#L620



More information about the Web-SIG mailing list