Unicode problem

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Sat Apr 7 15:46:49 EDT 2007


Rehceb Rotkiv wrote:

> #!/usr/bin/python
> import sys
> import codecs
> fileHandle = codecs.open(sys.argv[1], 'r', 'utf-8')
> fileString = fileHandle.read()
> print fileString
>
> if I call it from a Bash shell like this
>
> $ ./test.py testfile.utf8.txt
>
> it works just fine, but when I try to pipe the output to another process
> ("|") or into a file (">"), e.g. like this
>
> $ ./test.py testfile.utf8.txt | cat
>
> I get an error:
>
> Traceback (most recent call last):
>   File "./test.py", line 6, in ?
>     print fileString
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in
> position 538: ordinal not in range(128)
>
> I absolutely don't know what's the problem here, can you help?

Using codecs.open, when you read the file you get Unicode. When you
print the Unicode object, it is encoded using your terminal default
encoding (utf8 I presume?)
But when you redirect the output, it's no more connected to your
terminal so no encoding can be assumed, and the default encoding is
used.

Try this line at the top:
print
"stdout:",sys.stdout.encoding,"default:",sys.getdefaultencoding()
I get stdout: ANSI_X3.4-1968 default: ascii normally and stdout: None
default: ascii when redirected.

You have to encode the Unicode object explicitely: print
fileString.encode("utf-8")
(or any other suitable one; I said utf-8 just because you read the
input file using that)

--
Gabriel Genellina




More information about the Python-list mailing list