how to detect the encoding used for a specific text data ?
Jussi Piitulainen
jpiitula at ling.helsinki.fi
Thu Dec 20 09:10:08 EST 2012
iMath writes:
> which package to use ?
Read the text in as a "bytes object" (bytes), then it has a .decode
method that you can experiment with. Strings (str) are Unicode and
have an .encode method. These methods allow you to specify a desired
encoding and and what to do when there are errors.
help(bytes.decode)
help(str.encode)
help(open)
<http://docs.python.org/3.3/library/stdtypes.html>
In Python 2.7 and before, strings seem to do double duty and have both
the .encode and .decode methods, so Python version matters here.
More information about the Python-list
mailing list