Determining the encoding of a text file

J.R. j.r.gao at motorola.com
Mon Mar 1 21:20:21 EST 2004


"Rajorshi" <rajorshi at fastmail.fm> wrote in message
news:85b5e3f8.0403010224.939e8f8 at posting.google.com...
> Hello!
>  How do I determine the encoding of a text file ? That is,
> given a text file I want to know the encoding it is in
> UTF8 or UTF16 or Latin etc. It would be very helpful if
> you could tell me how to do this in python on Linux. But
> just the method is acceptable.
> Thanks in advance!

The python integrated development environment IDLE, which is distributed
alone with python, shows one approach how to decode a
string. You could find it in the file $PYTHON/lib/idlelib/IOBinding.py, find
the decode().

But it's not perfect, you could integrate with Skip's example writing your
one.
Additional, if you want to guess the Chinese encoding, the perl lib
http://www.mandarintools.com/download/codelib.zip
may be for your reference, it can support GB2312-80, Hz, Big5, UTF-8, etc.

J.R.





More information about the Python-list mailing list