[Web-SIG] WSGI, Python 3 and Unicode

Thomas Broyer t.broyer at gmail.com
Fri Dec 7 11:16:13 CET 2007


I wasn't there when PEP-333 was written, nor have I any implication in
any Python development, but here are my thoughts:

2007/12/7, Alan Kennedy:
>
> I think it's worth pointing out the reason for the current restriction
> to iso-8859-1 is *because* python did not have a bytes type at the
> time the WSGI spec was drawn up. IIRC, the bytes type had not yet even
> been proposed for Py3K. Cpython effectively held all byte sequences as
> strings, a paradigm which is (still) followed by jython (not sure
> about ironpython).
>
> The restriction to iso-8859-1 is really a distraction; iso-8859-1 is
> used simply as an identity encoding that also enforces that all
> "bytes" in the string have a value from 0x00 to 0xff, so that they are
> suitable for byte-oriented IO. So, in output terms at least, WSGI *is*
> a byte-oriented protocol. The problem is the python-the-language
> didn't have support for bytes at the time WSGI was designed.

If you're talking about the "output stream", then yes, it's all about
bytes (or should be). But at the status and headers level, HTTP/1.1 is
fundamentally ISO-8859-1-encoded.

See:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2 (the note
about *TEXT)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
(field-content is *TEXT, among other things)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6.1
(Reason-Phrase is *TEXT)

-- 
Thomas Broyer


More information about the Web-SIG mailing list