UTF-8 output problems

Sat Mar 10 09:32:31 EST 2007

Michael B. Trausch wrote:

> I am having a slight problem with UTF-8 output with Python.  I have the
> following program:
> 
> x = 0
> 
> while x < 0x4000:
>     print u"This is Unicode code point %d (0x%x): %s" % (x, x,
> unichr(x))
>     x += 1
> 
> This program works perfectly when run directly:
> 
> mbt at pepper:~/tmp$ python test.py
> This is Unicode code point 0 (0x0):
> This is Unicode code point 1 (0x1):
> This is Unicode code point 2 (0x2):
> This is Unicode code point 3 (0x3):
> This is Unicode code point 4 (0x4):
> This is Unicode code point 5 (0x5):
> This is Unicode code point 6 (0x6):
> This is Unicode code point 7 (0x7):
> This is Unicode code point 8 (0x8):
> This is Unicode code point 9 (0x9):
> This is Unicode code point 10 (0xa):
> (... continued)
> 
> However, when I attempt to redirect the output to a file:
> 
> mbt at pepper:~/tmp$ python test.py >f
> Traceback (most recent call last):
>   File "test.py", line 6, in <module>
>     print u"This is Unicode code point %d (0x%x): %s" % (x, x,
> unichr(x))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in
> position 39: ordinal not in range(128)
> 
> This is slightly confusing to me.  The output goes all the way to the
> end of the program when it is not redirected.  Why is Python treating
> the situation differently when the output is redirected?  This failure
> occurs for all redirection, by the way: >, >>, 1>2, pipes, and so forth.
> 
> Any ideas?

In complement to Marc reply, you can open a file with a specific encoding
(see codecs.open() function), and use print >> f,... to fill that file.

A+

Laurent.