catch UnicodeDecodeError

jaroslav.dobrek at gmail.com jaroslav.dobrek at gmail.com
Wed Jul 25 07:05:28 EDT 2012


Hello,

very often I have the following problem: I write a program that processes many files which it assumes to be encoded in utf-8. Then, some day, I there is a non-utf-8 character in one of several hundred or thousand (new) files. The program exits with an error message like this:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 60: invalid continuation byte

I usually solve the problem by moving files around and by recoding them.

What I really want to do is use something like

try:
    # open file, read line, or do something else, I don't care
except UnicodeDecodeError:
    sys.exit("Found a bad char in file " + file + " line " + str(line_number)

Yet, no matter where I put this try-except, it doesn't work.

How should I use try-except with UnicodeDecodeError?

Jaroslav



More information about the Python-list mailing list