q: how to output a unicode string?

Diez B. Roggisch deets at nospam.web.de
Tue Apr 24 12:43:16 EDT 2007


Frank Stajano wrote:

> A simple unicode question. How do I print?
> 
> Sample code:
> 
> # -*- coding: utf-8 -*-
> s1 = u"héllô wórld"
> print s1
> # Gives UnicodeEncodeError: 'ascii' codec can't encode character
> # u'\xe9' in position 1: ordinal not in range(128)
> 
> 
> What I actually want to do is slightly more elaborate: read from a text
> file which is in utf-8, do some manipulations of the text and print the
> result on stdout. I understand I must open the file with
> 
> f = codecs.open("input.txt", "r", "utf-8")
> 
> but then I get stuck as above.
> 
> I tried
> 
> s2 = s1.encode("utf-8")
> print s2
> 
> but got
> 
> héllô wórld

Which is perfectly alright - it's just that your terminal isn't prepared to
decode UTF-8, but some other encoding, like latin1.
 
> Then, in the hope of being able to write the string to a file if not to
> stdout, I also tried
> 
> 
> import codecs
> f = codecs.open("out.txt", "w", "utf-8")
> f.write(s2)
> 
> but got
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
> ordinal not in range(128)

Instead of writing s2 (which is a byte-string!!!), write s1. It will work.

The error you get stems from f.write wanting a unicode-object, but s2 is a
bytestring (you explicitly converted it before), so python tries to encode
the bytestring with the default encoding - ascii - to a unicode string.
This of course fails.

Diez



More information about the Python-list mailing list