[Python-ideas] csv.DictReader could handle headers more intelligently.

Jerry Hill malaclypse2 at gmail.com
Wed Jan 23 20:59:42 CET 2013


On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett
<mark.hackett at metoffice.gov.uk> wrote:
> I can't see why there would be duplicate column headers for valid reason.
>
> Someone may have written their CSV export incorrectly, but that's not actually
> valid.

Sure it is.  Since there is no formal spec for .csv files, having a
multiple columns with the same text in the header is a perfectly valid
.csv file.  For what it's worth, the informal spec for csv files seems
to be "whatever Excel does" and Excel (and every other
spreadsheet-oriented program) is happy to let you have duplicated
headers too.

> It would therefore be arguable for the program to give at least a WARNING that
> it's throwing data away.

I think the library should give the programmer some sort of indication
that they are losing data.  Personally, I'd prefer an exception which
can either be caught or not, depending on whether the program is
designed to handle the situation or not.

> However, since python is mechanising this as a dictionary and since in python
> setting A to 1 then setting A to 3 would throw away the earlier value for A
> and the import function working AS EXPECTED in Python.

I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
It's not terribly surprising once you sit down and think about it, but
it's certainly at least a little unexpected to me that data is being
thrown away with no notice.  It's unusual for errors to pass silently
in python.

-- 
Jerry



More information about the Python-ideas mailing list