[Python-Dev] More Unicode support

Guido van Rossum guido@python.org
Sun, 05 Nov 2000 22:40:33 -0500


[me]
> > - Internationalization.  Barry knows what he wants here; I bet Martin
> >   von Loewis and Marc-Andre Lemburg have ideas too.

[MAL]
> We'd need a few more codecs, support for the Unicode compression,
> normalization and collation algorithms.

Hm...  There's also the problem that there's no easy way to do Unicode
I/O.  I'd like to have a way to turn a particular file into a Unicode
output device (where the actual encoding might be UTF-8 or UTF-16 or a
local encoding), which should mean that writing Unicode objects to the
file should "do the right thing" (in particular should not try to
coerce it to an 8-bit string using the default encoding first, like
print and str() currently do) and that writing 8-bit string objects to
it should first convert them to Unicode using the default encoding
(meaning that at least ASCII strings can be written to a Unicode file
without having to specify a conversion).  I support that reading from
a "Unicode file" should always return a Unicode string object (even if
the actual characters read all happen to fall in the ASCII range).

This requires some serious changes to the current I/O mechanisms; in
particular str() needs to be fixed, or perhaps a ustr() needs to be
added that it used in certain cases.  Tricky, tricky!

--Guido van Rossum (home page: http://www.python.org/~guido/)