Detect character encoding

Diez B. Roggisch deets at nospam.web.de
Sun Dec 4 10:24:03 EST 2005


Michal wrote:
> Hello,
> is there any way how to detect string encoding in Python?
> 
> I need to proccess several files. Each of them could be encoded in 
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and 
> encode it to utf-8 (with string function encode).

You can only guess, by e.g. looking for words that contain e.g. umlauts. 
Recode might be of help here, it has such heuristics built in AFAIK.

But there is _no_ way to be absolutely sure. 8bit are 8bit, so each file 
is "legal" in all encodings.


Diez



More information about the Python-list mailing list