[Python-Dev] More Unicode support
Guido van Rossum
guido@python.org
Sun, 05 Nov 2000 22:40:33 -0500
[me]
> > - Internationalization. Barry knows what he wants here; I bet Martin
> > von Loewis and Marc-Andre Lemburg have ideas too.
[MAL]
> We'd need a few more codecs, support for the Unicode compression,
> normalization and collation algorithms.
Hm... There's also the problem that there's no easy way to do Unicode
I/O. I'd like to have a way to turn a particular file into a Unicode
output device (where the actual encoding might be UTF-8 or UTF-16 or a
local encoding), which should mean that writing Unicode objects to the
file should "do the right thing" (in particular should not try to
coerce it to an 8-bit string using the default encoding first, like
print and str() currently do) and that writing 8-bit string objects to
it should first convert them to Unicode using the default encoding
(meaning that at least ASCII strings can be written to a Unicode file
without having to specify a conversion). I support that reading from
a "Unicode file" should always return a Unicode string object (even if
the actual characters read all happen to fall in the ASCII range).
This requires some serious changes to the current I/O mechanisms; in
particular str() needs to be fixed, or perhaps a ustr() needs to be
added that it used in certain cases. Tricky, tricky!
--Guido van Rossum (home page: http://www.python.org/~guido/)