q: how to output a unicode string?

Diez B. Roggisch deets at nospam.web.de
Wed Apr 25 06:46:04 EDT 2007


> So why is it that in the first case I got UnicodeEncodeError: 'ascii'
> codec can't encode? Seems as if, within Idle, a utf-8 codec is being
> selected automagically... why should that be so there and not in the
> first case?

I'm a bit confused on what you did when.... the error appears if you try to
output a unicode-object without prior encoding - then the default encoding
(ascii) is used.
 
>>> Then, in the hope of being able to write the string to a file if not to
>>> stdout, I also tried
>>>
>>>
>>> import codecs
>>> f = codecs.open("out.txt", "w", "utf-8")
>>> f.write(s2)
>>>
>>> but got
>>>
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1:
>>> ordinal not in range(128)
>> 
>> Instead of writing s2 (which is a byte-string!!!), write s1. It will
>> work.
> 
> OK, many thanks, I got this to work!
> 
>> The error you get stems from f.write wanting a unicode-object, but s2 is
>> a bytestring (you explicitly converted it before), so python tries to
>> encode the bytestring with the default encoding - ascii - to a unicode
>> string. This of course fails.
> 
> I think I have a better understanding of it now. If the terminal hadn't
> fooled me, I probably wouldn't have assumed that the code I originally
> wrote (following the first examples I found) was wrong! I assume that
> when you say "bytestring" you mean "a string of bytes in a certain
> encoding (here utf-8) that can be used as an external representation for
> the unicode string which is instead a sequence of code points".

Yes. That is exactly the difference. 

Diez



More information about the Python-list mailing list