recycling internationalized garbage

"Martin v. Löwis" martin at v.loewis.de
Tue Mar 14 16:33:21 EST 2006


Ross Ridge wrote:
> aaronwmail-usenet at yahoo.com wrote:
> 
>>    try:
>>        (uni, dummy) = utf8dec(s)
>>    except:
>>        (uni, dummy) = iso88591dec(s, 'ignore')
> 
> 
> Is there really any point in even trying to decode with UTF-8?  You
> might as well just assume ISO 8859-1.

The point is that you can tell UTF-8 reliably. If the data decodes
as UTF-8, it *is* UTF-8, because no other encoding in the world
produces the same byte sequences (except for ASCII, which is
an UTF-8 subset).

So if it is not UTF-8, the guessing starts.

Regards,
Martin



More information about the Python-list mailing list