[Python-Dev] More Unicode support

Guido van Rossum guido@python.org
Mon, 06 Nov 2000 12:20:03 -0500


[GvR]
> > Hm...  There's also the problem that there's no easy way to do Unicode
> > I/O.  I'd like to have a way to turn a particular file into a Unicode
> > output device (where the actual encoding might be UTF-8 or UTF-16 or a
> > local encoding), which should mean that writing Unicode objects to the
> > file should "do the right thing" (in particular should not try to
> > coerce it to an 8-bit string using the default encoding first, like
> > print and str() currently do) and that writing 8-bit string objects to
> > it should first convert them to Unicode using the default encoding
> > (meaning that at least ASCII strings can be written to a Unicode file
> > without having to specify a conversion).  I support that reading from
> > a "Unicode file" should always return a Unicode string object (even if
> > the actual characters read all happen to fall in the ASCII range).
> > 
> > This requires some serious changes to the current I/O mechanisms; in
> > particular str() needs to be fixed, or perhaps a ustr() needs to be
> > added that it used in certain cases.  Tricky, tricky!

[MAL]
> It's not all that tricky since you can write a StreamRecoder
> subclass which implements this. AFAIR, I posted such an implementation
> on i18n-sig.
> 
> BTW, one of my patches on SF adds unistr(). Could be that it's
> time to apply it :-)

Adding unistr() and StreamRecoder isn't enough.  The problem is that
when you set sys.stdout to a StreamRecoder, the print statement
doesn't do the right thing!  Try it.  print u"foo" will work, but
print u"\u1234" will fail because print always applies the default
encoding.

The required changes to print are what's tricky.  Whether we even need
unistr() depends on the solution we find there.

--Guido van Rossum (home page: http://www.python.org/~guido/)