recycling internationalized garbage
"Martin v. Löwis"
martin at v.loewis.de
Tue Mar 14 16:33:21 EST 2006
Ross Ridge wrote:
> aaronwmail-usenet at yahoo.com wrote:
>
>> try:
>> (uni, dummy) = utf8dec(s)
>> except:
>> (uni, dummy) = iso88591dec(s, 'ignore')
>
>
> Is there really any point in even trying to decode with UTF-8? You
> might as well just assume ISO 8859-1.
The point is that you can tell UTF-8 reliably. If the data decodes
as UTF-8, it *is* UTF-8, because no other encoding in the world
produces the same byte sequences (except for ASCII, which is
an UTF-8 subset).
So if it is not UTF-8, the guessing starts.
Regards,
Martin
More information about the Python-list
mailing list