An attempt at guessing the encoding of a (non-unicode) string
Christos TZOTZIOY Georgiou
tzot at sil-tec.gr
Fri Apr 2 10:23:05 EST 2004
On Fri, 02 Apr 2004 15:05:42 GMT, rumours say that Jon Willeke
<j.dot.willeke at verizon.dot.net> might have written:
>Christos TZOTZIOY Georgiou wrote:
<snip>
>>
>> This could be implemented as a function in codecs.py (let's call it
>> "wild_guess"), that is based on some pre-calculated data. These
>> pre-calculated data would be produced as follows:
>...
<snip>
[Jon]
>The representative text would, in some circles, be called a training
>corpus. See the Natural Language Toolkit for some modules that may help
>you prototype this approach:
>
> <http://nltk.sf.net/>
>
>In particular, check out the probability tutorial.
Thanks for the hint, and I am browsing the documentation now. However,
I'd like to create something that would not be dependent on external
python libraries, so that anyone interested would just download a small
module that would do the job, hopefully good.
--
TZOTZIOY, I speak England very best,
Ils sont fous ces Redmontains! --Harddix
More information about the Python-list
mailing list