[New-bugs-announce] [issue37984] Unable parse csv on latin iso or binary mode

Yhojann Aguilera report at bugs.python.org
Thu Aug 29 18:17:04 EDT 2019


New submission from Yhojann Aguilera <yhojann.aguilera at gmail.com>:

Unable parse a csv with latin iso charset.

with open('./exported.csv', newline='') as csvFileHandler:
            csvHandler = csv.reader(csvFileHandler, delimiter=';', quotechar='"')
            for line in csvHandler:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 1032: invalid continuation byte

I try using a binary mode on open() but says: binary mode doesn't take a newline argument. Ok, replace newline to binary char: newline=b'', but says: open() argument 6 must be str or None, not bytes. Ok, remove newline argument: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?).

Ok, csv module no support binary read mode. Try use latin iso:

with open('./exported.csv', mode='r', encoding='ISO-8859', newline='') as csvFileHandler:

UnicodeDecodeError: 'charmap' codec can't decode byte 0xd1 in position 1032: character maps to <undefined>

But the charset is latin iso:

$ file exported.csv 
exported.csv: ISO-8859 text, with very long lines, with CRLF line terminators

Ok, change to ISO-8859-8:

UnicodeDecodeError: 'charmap' codec can't decode byte 0xd1 in position 1032: character maps to <undefined>

Unable load the file. Why not give the option to work binary? the delimiters can be represented with binary values.

----------
components: Unicode
messages: 350836
nosy: Yhojann Aguilera, ezio.melotti, vstinner
priority: normal
severity: normal
status: open
title: Unable parse csv on latin iso or binary mode
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37984>
_______________________________________


More information about the New-bugs-announce mailing list