read from file with mixed encodings in Python3

Jaroslav Dobrek jaroslav.dobrek at gmail.com
Mon Nov 7 09:23:12 EST 2011


Hello,

in Python3, I often have this problem: I want to do something with
every line of a file. Like Python3, I presuppose that every line is
encoded in utf-8. If this isn't the case, I would like Python3 to do
something specific (like skipping the line, writing the line to
standard error, ...)

Like so:

try:
   ....
except UnicodeDecodeError:
  ...

Yet, there is no place for this construction. If I simply do:

for line in f:
    print(line)

this will result in a UnicodeDecodeError if some line is not utf-8,
but I can't tell Python3 to stop:

This will not work:

for line in f:
    try:
        print(line)
    except UnicodeDecodeError:
        ...

because the UnicodeDecodeError is caused in the "for line in f"-part.

How can I catch such exceptions?

Note that recoding the file before opening it is not an option,
because often files contain many different strings in many different
encodings.

Jaroslav



More information about the Python-list mailing list