UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to <undefined>

Chris Angelico rosuav at gmail.com
Tue May 29 05:46:24 EDT 2018


On Tue, May 29, 2018 at 6:15 PM, Peter J. Holzer <hjp-python at hjp.at> wrote:
> So if the text is German it will contain more words with
> umlauts and each byte which is part of a correctly spelled German word
> when interpreted according to ISO-8859-1 increases the probability that
> decoding with ISO-8859-1 will produce the correct result. There remains
> a tiny probability that all those matches are mere coincidence, but I
> wrote "almost always", not "always", so I can live with an error rate of
> 0.000001% (or something like that).

That's basically what the chardet module does, and its error rate is
far FAR higher than that. If you think it's easy to detect encodings,
I'm sure the chardet maintainers will be happy to accept pull
requests!

ChrisA



More information about the Python-list mailing list