[Python-ideas] csv.DictReader could handle headers more intelligently.

Steven D'Aprano steve at pearwood.info
Thu Jan 24 01:26:52 CET 2013


On 24/01/13 06:59, Jerry Hill wrote:
> On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett
> <mark.hackett at metoffice.gov.uk>  wrote:
>> I can't see why there would be duplicate column headers for valid reason.
>>
>> Someone may have written their CSV export incorrectly, but that's not actually
>> valid.
>
> Sure it is.  Since there is no formal spec for .csv files, having a
> multiple columns with the same text in the header is a perfectly valid
> .csv file.  For what it's worth, the informal spec for csv files seems
> to be "whatever Excel does" and Excel (and every other
> spreadsheet-oriented program) is happy to let you have duplicated
> headers too.

+1

I think keeping DictReader as it is now is fine for backward compatibility.
Or better, simply have DictReader raise an exception rather than silently
eat data. I don't expect that anyone is relying on that behaviour, nor is
it behaviour promised by the class.

But we should add a MultiDictReader that supports the multiple columns with
the same name.


>> It would therefore be arguable for the program to give at least a WARNING that
>> it's throwing data away.
>
> I think the library should give the programmer some sort of indication
> that they are losing data.  Personally, I'd prefer an exception which
> can either be caught or not, depending on whether the program is
> designed to handle the situation or not.
>
>> However, since python is mechanising this as a dictionary and since in python
>> setting A to 1 then setting A to 3 would throw away the earlier value for A
>> and the import function working AS EXPECTED in Python.
>
> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
> It's not terribly surprising once you sit down and think about it, but
> it's certainly at least a little unexpected to me that data is being
> thrown away with no notice.  It's unusual for errors to pass silently
> in python.

Yes, we should not forget that a CSV file is not a dict. Just because DictReader
is implemented with a dict as the storage, doesn't mean that it should behave
exactly like a dict in all things. Multiple columns with the same name are legal
in CSV, so there should be a reader for that situation.



-- 
Steven



More information about the Python-ideas mailing list