[issue28642] csv reader losing rows with big files and tab delimiter

Marc Garcia report at bugs.python.org
Wed Nov 9 04:11:41 EST 2016


Marc Garcia added the comment:

I could research a bit more on the problem. This is a minimal code that reproduces what happened:

    from io import StringIO
    import csv

    csv_file = StringIO('''1\t"A
    2\tB''')

    reader = csv.reader(csv_file, delimiter='\t')
    for i, row in enumerate(reader):
        pass

    print(reader.line_num)  # 2
    print(i + 1)            # 1

The reason to return the right number of rows with the default delimiter, is because the quote needs to be immediately after the delimiter to be considered the opening of a quoted text.

If the file contains an opening quote, and the EOF is reached without its closing quote, the reader considers all the text until EOF to be that field.

This would work as expected in a line like:

    1,"well quoted text","this one has a missing quote

But it'd fail silently with unexpected results in all other cases. I'd expect csv to raise an exception, more than the current behavior.

Do you agree? Should I create another issue to address this?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28642>
_______________________________________


More information about the Python-bugs-list mailing list