the stupid encoding problem to stdout

Sérgio Monteiro Basto sergiomb at sapo.pt
Thu Jun 9 17:14:17 EDT 2011


Benjamin Kaplan wrote:

> 2011/6/8 Sérgio Monteiro Basto <sergiomb at sapo.pt>:
>> hi,
>> cat test.py
>> #!/usr/bin/env python
>> #-*- coding: utf-8 -*-
>> u = u'moçambique'
>> print u.encode("utf-8")
>> print u
>>
>> chmod +x test.py
>> ./test.py
>> moçambique
>> moçambique
>>
>> ./test.py > output.txt
>> Traceback (most recent call last):
>> File "./test.py", line 5, in <module>
>> print u
>> UnicodeEncodeError: 'ascii' codec can't encode character
>> u'\xe7' in position 2: ordinal not in range(128)
>>
>> in python 2.7
>> how I explain to python to send the same thing to stdout and
>> the file output.txt ?
>>
>> Don't seems logic, when send things to a file the beaviour
>> change.
>>
>> Thanks,
>> Sérgio M. B.
> 
> That's not a terminal vs file thing. It's a "file that declares it's
> encoding" vs a "file that doesn't declare it's encoding" thing. Your
> terminal declares that it is UTF-8. So when you print a Unicode string
> to your terminal, Python knows that it's supposed to turn it into
> UTF-8. When you pipe the output to a file, that file doesn't declare
> an encoding. So rather than guess which encoding you want, Python
> defaults to the lowest common denominator: ASCII. If you want
> something to be a particular encoding, you have to encode it yourself.

Exactly the opposite , if python don't know the encoding should not try 
decode to ASCII.

> 
> You have a couple of choices on how to make it work:
> 1) Play dumb and always encode as UTF-8. This would look really weird
> if someone tried running your program in a terminal with a CP-847
> encoding (like cmd.exe on at least the US version of Windows), but it
> would never crash.

I want python don't care about encoding terminal and send characters as they 
are or for a file . 

> 2) Check sys.stdout.encoding. If it's ascii, then encode your unicode
> string in the string-escape encoding, which substitutes the escape
> sequence in for all non-ASCII characters.

How I change sys.stdout.encoding always to UTF-8 ? at least have a  
consistent sys.stdout.encoding 

> 3) Check to see if sys.stdout.isatty() and have different behavior for
> terminals vs files. If you're on a terminal that doesn't declare its
> encoding, encoding it as UTF-8 probably won't help. If you're writing
> to a file, that might be what you want to do.


Thanks,





More information about the Python-list mailing list