[Tutor] Decoding from strange symbols

Steven D'Aprano steve at pearwood.info
Wed Jan 19 15:13:59 CET 2011


Oleg Oltar wrote:
> Hi,
> 
> I am trying to decode a string I took from file:
[...]
> How do I convert this to something human readable?

In general, you can't unless you know the encoding. A file filled with 
arbitrary bytes could be anything.

However, you can sometimes guess the encoding, either by looking at it 
and reasoning carefully, as Peter Otten did when he suggested your file 
was UTF-16, or by statistical analysis, or some other combination of 
techniques. Guessing encodings is pretty much a black art, so if you 
need to do this a lot you should use an existing package like this one:

http://chardet.feedparser.org/

Once you have the encoding, or at least a guess for the encoding:

bytes = open(filename).read()
text = bytes.decode(encoding)

or use the codecs module, as Peter showed.


-- 
Steven



More information about the Tutor mailing list