[Web-SIG] bytes, strings, and Unicode in Jython, IronPython, and CPython 3.0

Wed Sep 15 18:01:25 CEST 2004

At 12:33 PM 9/15/04 +0200, Paul Boddie wrote:
>Phillip J. Eby wrote:
> > So, here's what I propose to do about the open issue in PEP 333.
>Servers
> > and gateways that run under Python implementations where all strings
>are
> > Unicode (e.g. Jython) *may*:
> >
> >   * accept Unicode statuses and headers, so long as they properly
>encode
> > them for transmission (latin-1 + RFC 2047)
>
>I think I encode all Unicode objects used in this area as US-ASCII in
>WebStack.
>
> >   * accept Unicode for response body segments, so long as each segment
>may
> > be encoded as latin-1 (i.e. only uses chars 0-255)
>
>It should be possible to be more intelligent about response bodies, but
>you
>can argue that it isn't up to something like WSGI to go through the
>necessary gymnastics to make sure that Unicode objects presented to the
>response stream become encoded appropriately.
>
> >   * produce Unicode input headers and body strings by decoding from
> > latin-1, as long as the produced values are considered type 'str' for
>that
> > Python implementation.
>
>I think I've left incoming headers as plain strings, but I suppose a
>similar
>translation could be performed in WebStack.

You only need to worry about these things in WebStack if it's running under 
conditions where 'str' objects may contain any Unicode 
character.  Currently that's only Jython, and maybe IronPython.  As far as 
I know, CPython's -U option is broken; that is, not all of the Python 
stdlib works correctly with Unicode 'str' objects, so for the time being 
it's unlikely you'll need to worry about any of this under CPython.