Python nuube needs Unicode help

Diez B. Roggisch deets at nospam.web.de
Thu Jan 11 18:20:24 EST 2007


gheissenberger at gmail.com schrieb:
> HELP!
> Guy who was here before me wrote a script to parse files in Python.
> 
> Includes line:
> print u
> where u is a line from a file we are parsing.
> However, we have started recieving data from Brazil. If I open file to
> parse in VI, looks like:
> 
> <Utt id="3" transcribe="yes" audioRoot="A1"
> audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
> recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
> transcribedText="não" parsableText="não"/
> 
> Clearly those "n&#227" are some non-Ascii characters, but how do I get
> print to understand that?
> 
> I keep getting:
> "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
> position 40:
>  ordinal not in range(128)"
> 

Does the error happen at the

print u

line? If yes, what happens is that you try and print a unicode object. 
Which means that it has to be converted (actually the right term is 
encoded) to a byte-string. If you don't do that explicitely, it will be 
done implicitly, using the default encoding - which is ascii.

If you have non-ascii characters, you end up with the error you see.

What to do? Use something like this:

print u.encode('utf-8')

instead.

Diez



More information about the Python-list mailing list