Determining the encoding of a text file
David Opstad
opstad at batnet.com
Mon Mar 1 10:47:23 EST 2004
In article <85b5e3f8.0403010224.939e8f8 at posting.google.com>,
rajorshi at fastmail.fm (Rajorshi) wrote:
> How do I determine the encoding of a text file ? That is,
> given a text file I want to know the encoding it is in
> UTF8 or UTF16 or Latin etc. It would be very helpful if
> you could tell me how to do this in python on Linux. But
> just the method is acceptable.
If the first byte in the file is 0xFE and the second is 0xFF, then it's
likely the file is encoded in big-endian UTF-16. If the first byte is
0xFF and the second is 0xFE, then it's likely to be little-endian UTF-16.
Once you've eliminated those possibilities, then it gets trickier...
Dave
More information about the Python-list
mailing list