UnicodeDecodeError issue

MRAB python at mrabarnett.plus.com
Mon Sep 2 07:56:29 EDT 2013


On 02/09/2013 12:38, Dave Angel wrote:
> On 2/9/2013 00:16, Ferrous Cranus wrote:
>>>
>>> Have you tried to decode those bytes in various encodings other than
>>> utf-8 ?
>>
>> No, because i wasn't aware of what string/variable they were pertaining at.
>>
>    http://pypi.python.org/pypi/chardet
>
> is a package which tries to 'guess' an encoding for a string of bytes.
> I happen to have the 2.7 version installed, but not the 3.x version, so
> the following is in 2.7. Same thing should work in 3.3....
>
>>>> chardet.detect(b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2')
> {'confidence': 0.9638983132261467, 'encoding': 'windows-1253'}
>>>> print b'\xb6\xe3\xed\xf9\xf3\xf4\xef\xfc\xed\xef\xec\xe1 \xf3\xf5\xf3\xf4\xde\xec\xe1\xf4\xef\xf2'.decode('windows-1253')
> ¶γνωστοόνομα συστήματος
>
> I don't have a clue what it might be;  it's not English, and I don't
> know whatever language it may be in.
>
You don't recognise Greek?

> Does that string make any sense to you?  You may want to try it on your
> own machine, since the email may obscure the encoding.  Or you might
> want to do the decode using whatever the default encoding is for that
> server.
>
> The Linux 'file' utility thinks this string is in ISO-8859, so you might
> want to try a decode('ISO-8859-1') as well.  (and maybe  ISO-8859-2, -3,
> -4, and -5)
>
It's ISO-8859-7 (Greek).



More information about the Python-list mailing list