Python nuube needs Unicode help

Thu Jan 11 23:05:45 EST 2007

At Thursday 11/1/2007 18:27, gheissenberger at gmail.com wrote:

>HELP!
>Guy who was here before me wrote a script to parse files in Python.
>
>Includes line:
>print u
>where u is a line from a file we are parsing.
>However, we have started recieving data from Brazil. If I open file to
>parse in VI, looks like:
>
><Utt id="3" transcribe="yes" audioRoot="A1"
>audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
>recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
>transcribedText="não" parsableText="não"/

Is this part of an XML document? You should use a 
true XML parser instead of doing that by hand.

>Clearly those "n&#227" are some non-Ascii characters, but how do I get
>print to understand that?

Understanding how Unicode works may be very 
useful: http://www.amk.ca/python/howto/unicode

>I keep getting:
>"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
>position 40:
>  ordinal not in range(128)"

py> u = u"áéíóú"
py> print u, repr(u)
áéíóú u'\xe1\xe9\xed\xf3\xfa'
py> print str(u)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
characters in position 0-4: ordin
al not in range(128)
py> print u.encode('cp850')
áéíóú

(cp850 is my console encoding)

-- 
Gabriel Genellina
Softlab SRL 

__________________________________________________ 
Preguntá. Respondé. Descubrí. 
Todo lo que querías saber, y lo que ni imaginabas, 
está en Yahoo! Respuestas (Beta). 
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas