Detect character encoding

The new guy not at interesting.com
Mon Dec 5 23:59:49 EST 2005


Michal wrote:

> Hello,
> is there any way how to detect string encoding in Python?
> 
> I need to proccess several files. Each of them could be encoded in
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and
> encode it to utf-8 (with string function encode).

Well, about how to detect it in Python, I can't help. My first guess,
though, would be to have a look at the source code of the "file" utility.
This is an example of what it does:

# ls
de.i18n  en.i18n
# file *
de.i18n: ISO-8859 text, with very long lines
en.i18n: ISO-8859 English text, with very long lines

cheers



More information about the Python-list mailing list