An attempt at guessing the encoding of a (non-unicode) string

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Mon Apr 5 05:46:58 EDT 2004


On Sat, 3 Apr 2004 12:22:05 -0800, rumours say that "Roger Binns"
<rogerb at rogerbinns.com> might have written:

>Christos TZOTZIOY Georgiou wrote:
>> This could be implemented as a function in codecs.py (let's call it
>> "wild_guess"), that is based on some pre-calculated data.

>Windows already has a related function:
>
>http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81np.asp

As far as I understand, this function tests whether its argument is a
valid Unicode text, so it has little to do with the issue I brought up:
take a python string (8-bit bytes) and try to guess its encoding (eg,
iso8859-1, iso8859-7 etc).

There must be a similar function used for the "auto guess encoding"
function of the MS Internet Explorer, however:

1. even if it is exported and usable under windows, it is not platform
independent

2. its guessing success rate (until IE 5.5 which I happen to use) is not
very high

<snip>

Thanks for your reply, anyway.
-- 
TZOTZIOY, I speak England very best,
Ils sont fous ces Redmontains! --Harddix



More information about the Python-list mailing list