[Chicago] Chardet help

Clyde Forrester clydeforrester at gmail.com
Sun Mar 10 18:38:23 CET 2013


According to Firefox, the encoding is windows-1252.

On 3/10/2013 10:00 AM, Tathagata Dasgupta wrote:
> Good morning Chipy,
> Some encoding foo to spoil the Sunday morning motivational beverage!
>
> I am trying to read a file 
> (https://dl.dropbox.com/u/18146922/uniq_words_in_corpus.txt)   written 
> in Italian - and after a bit of trial and error decided to go with 
> chardet.
>
>
> def getEncoding(infile):
> import chardet
> rawdata = open(infile, "r").read()
> result = chardet.detect(rawdata)
> charenc = result['encoding']
> print charenc
>
> That gives me ISO-8859-2.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20130310/59f935b7/attachment.html>


More information about the Chicago mailing list