the stupid encoding problem to stdout

Wed Jun 8 22:39:44 EDT 2011

Sérgio Monteiro Basto <sergiomb at sapo.pt> writes:

> ./test.py
> moçambique
> moçambique

In this case your terminal is reporting its encoding to Python, and it's
capable of taking the UTF-8 data that you send to it in both cases.

> ./test.py > output.txt
> Traceback (most recent call last):
>   File "./test.py", line 5, in <module>
>     print u
> UnicodeEncodeError: 'ascii' codec can't encode character 
> u'\xe7' in position 2: ordinal not in range(128)

In this case your shell has no preference for the encoding (since you're
redirecting output to a file).

In the first print statement you specify the encoding UTF-8, which is
capable of encoding the characters.

In the second print statement you haven't specified any encoding, so the
default ASCII encoding is used.

Moral of the tale: Make sure an encoding is specified whenever data
steps between bytes and characters.

> Don't seems logic, when send things to a file the beaviour change.

They're different files, which have been opened with different
encodings. If you want a different encoding, you need to specify that.

-- 
 \               “There's no excuse to be bored. Sad, yes. Angry, yes. |
  `\    Depressed, yes. Crazy, yes. But there's no excuse for boredom, |
_o__)                                          ever.” —Viggo Mortensen |
Ben Finney