[Web-SIG] WSGI, Python 3 and Unicode

Alan Kennedy pywebsig at xhaus.com
Fri Dec 7 12:24:03 CET 2007


[Alan]
>> The restriction to iso-8859-1 is really a distraction; iso-8859-1 is
>> used simply as an identity encoding that also enforces that all
>> "bytes" in the string have a value from 0x00 to 0xff, so that they are
>> suitable for byte-oriented IO. So, in output terms at least, WSGI *is*
>> a byte-oriented protocol. The problem is the python-the-language
>> didn't have support for bytes at the time WSGI was designed.

[Thomas]
> If you're talking about the "output stream", then yes, it's all about
> bytes (or should be).

Indeed, I was only talking about output, specifically the response body.

> But at the status and headers level, HTTP/1.1 is
> fundamentally ISO-8859-1-encoded.

Agreed.

That is why the WSGI spec also states

"""
Note also that strings passed to start_response() as a status or as
response headers must follow RFC 2616 with respect to encoding. That
is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME
encoding.
"""

So in order to use non-ISO-8859-1 characters in response status
strings or headers, you must use RFC 2047.

As confirmed by the links you posted, this is a HTTP restriction, not
a WSGI restriction.

Regards,

Alan.


More information about the Web-SIG mailing list