print() and unicode strings (python 3.1)

Ned Deily nad at acm.org
Tue Aug 25 00:09:53 EDT 2009


In article 
<e5e2ec2e-2b4a-4ca8-8c0f-109e5f4eb542 at v23g2000pro.googlegroups.com>,
 7stud <bbxx789_05ss at yahoo.com> wrote:

> On Aug 24, 2:41 pm, "Martin v. Löwis" <mar... at v.loewis.de> wrote:
> > > I can't figure out a way to programatically set the encoding for
> > > sys.stdout.  So where does that leave me?
> >
> > You should be setting the terminal encoding administratively, not
> > programmatically.
> >
> 
> The terminal encoding has always been utf-8.  It was not set
> programmatically.
> 
> It seems to me that python 3.1's string handling is broken.
> Apparently, in python 3.1 I am unable to explicitly set the encoding
> of a string and print() it out with the result being human readable
> text.  On the other hand, if I let python do the encoding implicitly,
> python uses a codec I don't want it to.

If you are running on a Unix-y system, check your locale settings (LANG, 
LC.*, et al).  I think you'll likely find that your locale is really not 
UTF-8.   The following was on Python 3.1 on OS X 10.5, similar results 
on Debian Linux:

$ cat t3.py
import sys
print(sys.stdout.encoding)
s = "¤"
print(s.encode("utf-8"))
print(s)

$ export LANG=en_US.UTF-8
$ python3.1 t3.py
UTF-8
b'\xe2\x82\xac'
¤

$ export LANG=C
$ python3.1 t3.py
US-ASCII
b'\xe2\x82\xac'
Traceback (most recent call last):
  File "t3.py", line 7, in <module>
    print(s)
UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in 
position 0: ordinal not in range(128)

-- 
 Ned Deily,
 nad at acm.org




More information about the Python-list mailing list