[Python-ideas] csv.DictReader could handle headers more intelligently.

Shane Green shane at umbrellacode.com
Thu Jan 24 12:55:05 CET 2013


Not sure if I'm reading the discussion correctly, but it sounds like there's discussion about whether swallowing CSV values when confronted with multiple columns by the same name, which seems very incorrect if so.  CSV doesn't even mandate column headers exist at all, as far as I know.  If anything I would think mapping column positions to header values would make sense, such that header.items() -> [(0, header1), (1, header2), (2, header3), etc.], and header1 and header2 could be equal.  To work with rows as dictionaries they can follow the FieldStorage model and have lists of values–either when there's a collision, or always–so all column values are contained. 




Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 24, 2013, at 2:47 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
> 
>>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
>>> It's not terribly surprising once you sit down and think about it, but
>>> it's certainly at least a little unexpected to me that data is being
>>> thrown away with no notice.  It's unusual for errors to pass silently
>>> in python.
>> 
>> Yes, we should not forget that a CSV file is not a dict. Just because
>> DictReader is implemented with a dict as the storage, doesn't mean that it
>> should behave exactly like a dict in all things. Multiple columns with the
>> same name are legal in CSV, so there should be a reader for that
>> situation.
>> 
> 
> But just because it's reading a csv file, we shouldn't change how a dictionary 
> works if you add the same key again.
> 
> Duplicate headings in a csv file are as legal as using the same name for 
> something else in a programming language.
> 
> e.g.
> 
> endvalue=a+b+c/5
> ...code using that result...
> endvalue = os.printerr(file_descriptor)
> ...print out an error string...
> 
> this is "legal" but really REALLY smelly.
> 
> Similarly a multivalued csv file.
> 
> Excel uses the column ID not the name on the first row, to identify the columns 
> in its macro language. Because otherwise which "endvalue" column did you mean?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/c092d6af/attachment.html>


More information about the Python-ideas mailing list