q: how to output a unicode string?

Frank Stajano usenet423.4.fms at neverbox.com
Wed Apr 25 12:49:51 EDT 2007


Diez B. Roggisch wrote:
>> So why is it that in the first case I got UnicodeEncodeError: 'ascii'
>> codec can't encode? Seems as if, within Idle, a utf-8 codec is being
>> selected automagically... why should that be so there and not in the
>> first case?
> 
> I'm a bit confused on what you did when.... the error appears if you try to
> output a unicode-object without prior encoding - then the default encoding
> (ascii) is used.

Here's a minimal example for you.
I put these four lines into a utf-8 file.

# -*- coding: utf-8 -*-
# this file is called t3.py
s1 = u"héllô wórld"
print s1


If I invoke "python t3.py" at the cygwin/rxvt/bash prompt, I get:

Traceback (most recent call last):
   File "t3.py", line 4, in <module>
     print s1
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in 
position 1: ordinal not in range(128)

If I load the exact same file in Idle and press F5 (for Run), I get:

héllô wórld

So obviously "the system" is not behaving in the same way in the two 
cases. Maybe Python senses that it can do utf-8 when it's inside Idle 
and sets the default to utf-8 without me asking for it, and senses that 
it can't do (or more precisely output) utf-8 when it's in 
cygwin/rxvt/bash so there it sets the default codec to ascii. That's my 
best guess so far...

I find the encode/decode terminology somewhat confusing, because 
arguably both sides are "encoded". For example, a unicode-encoded string 
(I mean a sequence of unicode code points) should count as "decoded" in 
the terminology of this framework, right?

Anyway, thanks again for your help, for deepening my modest 
understanding of the issue and for solving my original problem!



More information about the Python-list mailing list