Python nuube needs Unicode help

Thu Jan 11 17:01:25 EST 2007

On 11 Jan 2007 13:28:14 -0800, gheissenberger at gmail.com
<gheissenberger at gmail.com> wrote:
> HELP!
> Guy who was here before me wrote a script to parse files in Python.
>
> Includes line:
> print u
> where u is a line from a file we are parsing.
> However, we have started recieving data from Brazil. If I open file to
> parse in VI, looks like:
>
> <Utt id="3" transcribe="yes" audioRoot="A1"
> audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
> recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
> transcribedText="não" parsableText="não"/
>
> Clearly those "n&#227" are some non-Ascii characters, but how do I get
> print to understand that?
>
> I keep getting:
> "UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
> position 40:
>  ordinal not in range(128)"
>

Find out what encoding the files are in and modify the script to use it.