[Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0

Wed Sep 15 12:33:31 CEST 2004

Phillip J. Eby wrote:
>
> I've reviewed last month's Python-Dev discussion about the future
Python 
> 'bytes()' type, and the eventual transition away from Python's current

> 8-bit strings.
> 
> Mainly, the impression I get is that significant change in this
respect 
> really can't happen until Python 3.0, because too many things have to 
> change at once for it to work.

I think there was (and perhaps still is) a runtime option to force
Python to
treat all strings as Unicode objects.

> So, here's what I propose to do about the open issue in PEP 333.
Servers 
> and gateways that run under Python implementations where all strings
are 
> Unicode (e.g. Jython) *may*:
> 
>   * accept Unicode statuses and headers, so long as they properly
encode 
> them for transmission (latin-1 + RFC 2047)

I think I encode all Unicode objects used in this area as US-ASCII in
WebStack.

>   * accept Unicode for response body segments, so long as each segment
may 
> be encoded as latin-1 (i.e. only uses chars 0-255)

It should be possible to be more intelligent about response bodies, but
you
can argue that it isn't up to something like WSGI to go through the
necessary gymnastics to make sure that Unicode objects presented to the
response stream become encoded appropriately.

>   * produce Unicode input headers and body strings by decoding from 
> latin-1, as long as the produced values are considered type 'str' for
that 
> Python implementation.

I think I've left incoming headers as plain strings, but I suppose a
similar
translation could be performed in WebStack.

Paul