Unicode

Anatoli Hristov tolidtm at gmail.com
Mon Dec 17 17:33:48 EST 2012


>> Just realize that once you start using 'ignore' you're going to also
>> ignore discrepancies that are real. For example, maybe your terminal is
>> actual something other than either latin-1 or utf-8.
>
> If you need to see such discrepancies, you can do
>
> print src.decode("utf-8").encode("latin-1", ""xmlcharrefreplace")
>
>
> That would produce something like:
>
> processeurs Intel® Core™ de 3ème génération av
>
> that is, the problem characters are displayed in &#...; notation.
> That is ugly, but sometimes it's the only way to see what character
> you really have.
>
> Notice that the number you get is in decimal, where the \u....
> notation uses hex:

Thanks guys my issue is now solved - the problem came from my Putty
client, it was on latin1 by default and changing it to utf-8, now
works...



More information about the Python-list mailing list