csv.DictReader line skipping should be considered a bug?

Steve D'Aprano steve+python at pearwood.info
Tue Dec 5 19:06:39 EST 2017


On Wed, 6 Dec 2017 04:20 am, Jason wrote:

> I ran into this:
>
https://stackoverflow.com/questions/27707581/why-does-csv-dictreader-skip-empty-lines
> 
> # unlike the basic reader, we prefer not to return blanks,
> # because we will typically wind up with a dict full of None
> # values
> 
> while iterating over two files, which are line-by-line corresponding. The
> DictReader skipped ahead many lines breaking the line-by-line
> correspondence.

Um... this doesn't follow. If they are line-by-line corresponding, then they
should skip the same number of blank lines and read the same number of
non-blank lines.

Even if one file has blanks and the other does not, if you iterate the over
the records themselves, they should keep their correspondence.

I'm afraid that if you want to convince me this is a buggy design, you need to
demonstrate a simple pair of CSV files where the non-blank lines are
corresponding (possibly with differing numbers of blanks in between) but the
CSV readers get out of alignment somehow.


> And I want to argue that the difference of behavior should be considered a
> bug. It should be considered as such because: 1. I need to know what's in
> the file to know what class to use.

Sure. But blank lines don't tell you what class to use.

> The file content should not break at-least-1-record-per-line.

Blank lines DO break that requirement. A blank line is not a record.


> There may me multiple lines per record in the 
> case of embedded new lines, but it should never no record per line.

I disagree. A blank line is not a record. If I have (say) five fields, then:

,,,,\n

is a blank record with five empty fields. \n alone is just a blank. The
DictReader correctly returns records with blank fields.


> 2.  It's a premature optimization. If skipping blank lines is desirable,
> then have another class on top of DictReader, maybe call it
> EmptyLineSkippingDictReader.

No, that's needless ravioli code. The csv module already defines a basic
reader that doesn't skip blank lines. Having two different DictReaders, one
which doesn't work correctly because it wrongly expands blank lines to
collections of blank fields, is not helpful.

Perhaps if they were called BrokenDictReader for the one which expands blank
lines to empty records, and DictReader for the one which correctly skips
blank lines.


> 3. The intent of DictReader is to return a 
> dict, nothing more, therefore the change of behavior isn inappropriate.

No, if all you want is a dict, call dict() or use the dict display {}. The
intent of DictReader is to *read a CSV file and extract the records* as a
dict. Since blank lines aren't records, they should be skipped.


> Does anyone agree, or am I crazy?

I wouldn't want to guess your mental health based just on this isolated
incident, but if I had to make a diagnosis, I'd say, yes, crazy as a loon.

*wink*




-- 
Steve




More information about the Python-list mailing list