How to display unicode with the CGI module?

greg greg at cosc.canterbury.ac.nz
Mon Nov 26 03:09:44 EST 2007


paul wrote:
> However, this will change in py3k..., 
> what's the new rule of thumb?

In py3k, the str type will be what unicode is now, and there
will be a new type called bytes for holding binary data --
including text in some external encoding. These two types
will not be compatible.

At the lowest level, reading a file will return bytes, which
then have to be decoded to produce a (unicode) str, and a str
will have to be encoded into bytes before being written to a
file.

There will be wrappers for text files that perform the
decoding and encoding automatically, but they will need to
be set up to use a specified encoding if you're dealing
with anything other than ascii. (It may be possible to
set up a system-wide default, I'm not sure.)

So you won't be able to get away with ignoring encoding
issues in py3k. On the plus side, it should all be handled
in a much more consistent and less error-prone way. If
you mistakenly try to use encoded data as though it were
decoded data or vice versa, you'll get a type error.

--
Greg



More information about the Python-list mailing list