csv.DictReader line skipping should be considered a bug?

Tue Dec 5 07:43:02 EST 2017

On 2017-12-06 00:06, Steve D'Aprano wrote:
> On Wed, 6 Dec 2017 04:20 am, Jason wrote:
>
>> I ran into this:
>>
> https://stackoverflow.com/questions/27707581/why-does-csv-dictreader-skip-emp
ty-lines
>>
>> # unlike the basic reader, we prefer not to return blanks,
>> # because we will typically wind up with a dict full of None
>> # values
>>
>> while iterating over two files, which are line-by-line corresponding. The
>> DictReader skipped ahead many lines breaking the line-by-line
>> correspondence.
>
> Um... this doesn't follow. If they are line-by-line corresponding, then they
> should skip the same number of blank lines and read the same number of
> non-blank lines.
>
> Even if one file has blanks and the other does not, if you iterate the over
> the records themselves, they should keep their correspondence.
>
> I'm afraid that if you want to convince me this is a buggy design, you need
to
> demonstrate a simple pair of CSV files where the non-blank lines are
> corresponding (possibly with differing numbers of blanks in between) but the
> CSV readers get out of alignment somehow.
>
>
>> And I want to argue that the difference of behavior should be considered a
>> bug. It should be considered as such because: 1. I need to know what's in
>> the file to know what class to use.
>
> Sure. But blank lines don't tell you what class to use.
>
>> The file content should not break at-least-1-record-per-line.
>
> Blank lines DO break that requirement. A blank line is not a record.
>
>
>> There may me multiple lines per record in the
>> case of embedded new lines, but it should never no record per line.
>
> I disagree. A blank line is not a record. If I have (say) five fields, then:
>
> ,,,,\n
>
> is a blank record with five empty fields. \n alone is just a blank. The
> DictReader correctly returns records with blank fields.
>
A blank line could be a record if there's only one field and it's empty.

[snip]