Python nuube needs Unicode help
Gabriel Genellina
gagsl-py at yahoo.com.ar
Thu Jan 11 23:05:45 EST 2007
At Thursday 11/1/2007 18:27, gheissenberger at gmail.com wrote:
>HELP!
>Guy who was here before me wrote a script to parse files in Python.
>
>Includes line:
>print u
>where u is a line from a file we are parsing.
>However, we have started recieving data from Brazil. If I open file to
>parse in VI, looks like:
>
><Utt id="3" transcribe="yes" audioRoot="A1"
>audio="313-20070102144528.wav" grammarSet="G3" rawText="não"
>recValue="{data:CHOICE=NO;}" conf="970" rawText2="" conf2="0"
>transcribedText="não" parsableText="não"/
Is this part of an XML document? You should use a
true XML parser instead of doing that by hand.
>Clearly those "nã" are some non-Ascii characters, but how do I get
>print to understand that?
Understanding how Unicode works may be very
useful: http://www.amk.ca/python/howto/unicode
>I keep getting:
>"UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in
>position 40:
> ordinal not in range(128)"
py> u = u"áéíóú"
py> print u, repr(u)
áéíóú u'\xe1\xe9\xed\xf3\xfa'
py> print str(u)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode
characters in position 0-4: ordin
al not in range(128)
py> print u.encode('cp850')
áéíóú
(cp850 is my console encoding)
--
Gabriel Genellina
Softlab SRL
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
More information about the Python-list
mailing list