Determining the encoding of a text file

Rajorshi rajorshi at fastmail.fm
Tue Mar 2 12:02:35 EST 2004


Thanks for your suggestions!


"J.R." <j.r.gao at motorola.com> wrote in message news:<c20r4m$jn$1 at newshost.mot.com>...
> "Rajorshi" <rajorshi at fastmail.fm> wrote in message
> news:85b5e3f8.0403010224.939e8f8 at posting.google.com...
> > Hello!
> >  How do I determine the encoding of a text file ? That is,
> > given a text file I want to know the encoding it is in
> > UTF8 or UTF16 or Latin etc. It would be very helpful if
> > you could tell me how to do this in python on Linux. But
> > just the method is acceptable.
> > Thanks in advance!
> 
> The python integrated development environment IDLE, which is distributed
> alone with python, shows one approach how to decode a
> string. You could find it in the file $PYTHON/lib/idlelib/IOBinding.py, find
> the decode().
> 
> But it's not perfect, you could integrate with Skip's example writing your
> one.
> Additional, if you want to guess the Chinese encoding, the perl lib
> http://www.mandarintools.com/download/codelib.zip
> may be for your reference, it can support GB2312-80, Hz, Big5, UTF-8, etc.
> 
> J.R.



More information about the Python-list mailing list