using DictReader() with .decode('utf-8', 'ignore')
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Apr 14 09:48:26 EDT 2015
On Tue, 14 Apr 2015 11:37 pm, Vincent Davis wrote:
>> Which DictReader? Do you mean the one in the csv module? I will assume
>> so.
>>
> yes.
>
>
>>
>> # untested
>> with open(dfile, 'r', encoding='utf-8', errors='ignore', newline='') as
>> f:
>> reader = csv.DictReader(f)
>> for row in reader:
>> print(row['fieldname'])
>>
>
> What you have seems to work, now I need to go find my strange symbols that
> are not 'utf-8' and see what happens
> I was thought, that I had to open with 'rb' to use encoding?
No, in Python 3 the rules are:
'rb' reads in binary mode, returns raw bytes without doing any decoding;
'r' reads in text mode, returns Unicode text, using the codec/encoding
specified. By default, if no encoding is specified, I think UTF-8 is used,
but it may depend on the platform.
If you are getting decoding errors when reading the file, it is possible
that the file isn't actually UTF-8. One test you can do:
with open(dfile, 'rb') as f:
for line in f:
try:
s = line.decode('utf-8', 'strict')
except UnicodeDecodeError as err:
print(err)
If you need help deciphering the errors, please copy and paste them here and
we'll see what we can do.
--
Steven
More information about the Python-list
mailing list