catch UnicodeDecodeError

Thu Jul 26 07:15:37 EDT 2012

Jaroslav Dobrek, 26.07.2012 12:51:
>>> try:
>>>     for line in f: # here text is decoded implicitly
>>>        do_something()
>>> except UnicodeDecodeError():
>>>     do_something_different()
> 
> the code above (without the brackets) is semantically bad: The
> exception is not caught.

Sure it is. Just to repeat myself: if the above doesn't catch the
exception, then the exception did not originate from the place where you
think it did. Again: look at the traceback.

>>> The problem is that vast majority of the thousands of files that I
>>> process are correctly encoded. But then, suddenly, there is a bad
>>> character in a new file. (This is so because most files today are
>>> generated by people who don't know that there is such a thing as
>>> encodings.) And then I need to rewrite my very complex program just
>>> because of one single character in one single file.
>>
>> Why would that be the case? The places to change should be very local in
>> your code.
> 
> This is the case in a program that has many different functions which
> open and parse different
> types of files. When I read and parse a directory with such different
> types of files, a program that
> uses
> 
> for line in f:
> 
> will not exit with any hint as to where the error occurred. I just
> exits with a UnicodeDecodeError.

... that tells you the exact code line where the error occurred. No need to
look around.

Stefan