[Python-ideas] csv.DictReader could handle headers more intelligently.

Ethan Furman ethan at stoneleaf.us
Fri Jan 25 17:48:43 CET 2013


On 01/25/2013 02:58 AM, Mark Hackett wrote:
> On Friday 25 Jan 2013, Ethan Furman wrote:
>> On 01/24/2013 02:47 AM, Mark Hackett wrote:
>>> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
>>>>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
>>>>> It's not terribly surprising once you sit down and think about it, but
>>>>> it's certainly at least a little unexpected to me that data is being
>>>>> thrown away with no notice.  It's unusual for errors to pass silently
>>>>> in python.
>>>>
>>>> Yes, we should not forget that a CSV file is not a dict. Just because
>>>>    DictReader is implemented with a dict as the storage, doesn't mean
>>>> that it should behave exactly like a dict in all things. Multiple
>>>> columns with the same name are legal in CSV, so there should be a reader
>>>> for that situation.
>>>
>>> But just because it's reading a csv file, we shouldn't change how a
>>> dictionary works if you add the same key again.
>>
>> The proposal is not to change how a dict works, but what the proper
>> response is for DictReader when a duplicate key is found.
>
> Ethan, the proposal is predicated on the "silent abandonment" (which isn't
> actually the case any more than doing:
>
> a=4
> a=9
>
> is abandoning silently the 4.) being unexpected.

We're going to have to agree to disagree on this point -- I think there 
is a huge difference between reassigning a variable which is completely 
under your control from losing entire columns of data from a file which 
you may have never seen before.


> Except, just like the assignment in the aside above, this is entirely what IS
> expected if you're putting a CSV line into a dictionary with duplicate key
> names.

Expected by whom?  The library writer?  Sure.  The application writer? 
Maybe.  The person creating the spreadsheet that's going to be dumped to 
csv to be imported into the program that thought, "This field also needs 
an item number... I'll call it 'item_no', just like that other column" 
-- Nope.


> If you don't want it to do what a dictionary does, then don't use DictReader,
> as Chris proposes.

DictReader puts a name on a column -- that's its primary use;  I don't 
think the designers had the goal of dropping data when they implemented 
it -- I suspect it was just missed as a possibility (not being the 
"normal" type of csv file) or putting a warning in the docs was missed.

~Ethan~



More information about the Python-ideas mailing list