An attempt at guessing the encoding of a (non-unicode) string

Christos TZOTZIOY Georgiou tzot at sil-tec.gr
Fri Apr 2 10:23:05 EST 2004


On Fri, 02 Apr 2004 15:05:42 GMT, rumours say that Jon Willeke
<j.dot.willeke at verizon.dot.net> might have written:

>Christos TZOTZIOY Georgiou wrote:
<snip>
>> 
>> This could be implemented as a function in codecs.py (let's call it
>> "wild_guess"), that is based on some pre-calculated data.  These
>> pre-calculated data would be produced as follows:
>...
<snip>

[Jon]
>The representative text would, in some circles, be called a training 
>corpus.  See the Natural Language Toolkit for some modules that may help 
>you prototype this approach:
>
>   <http://nltk.sf.net/>
>
>In particular, check out the probability tutorial.

Thanks for the hint, and I am browsing the documentation now.  However,
I'd like to create something that would not be dependent on external
python libraries, so that anyone interested would just download a small
module that would do the job, hopefully good.
-- 
TZOTZIOY, I speak England very best,
Ils sont fous ces Redmontains! --Harddix



More information about the Python-list mailing list