[Python-ideas] csv.DictReader could handle headers more intelligently.

Shane Green shane at umbrellacode.com
Tue Jan 29 11:18:21 CET 2013


So I wasn't really questioning the usefulness of the dictionary representation, but couldn't the returned object also let you access the header and value sequences, etc?  I was also thinking the conversion to simple dict with single (non-list) values per column could be part of the API.  

Appending duplicate field values as they're read reflects the order the duplicate entries appear in the source (when I've encountered CSV that purposely used duplicate column headers, the sequence they appear was critical).  The output from the current implementation should reflect the last duplicate value, as that always replaces previous ones in the dict, so my conversions returned the last value (-1), which should do the same…I think.  It was a straw man ;-).

I see your point about the point.  I think it would be good to have an implementation that kept all the information but still put the most usable API on it possible, rather than saying you can't have dictionary access unless you want to lose duplicate values, for example.  I mean, I've needed to consume CSV a lot, and that's what would have made the module useful to me, and the implementation that keeps all the information and lets it easily to trimmed as-not-needed seems better than one that just wipes it out to start.  







Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 12:17 AM, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

> Shane Green writes:
> 
>> Actually I've seen a many real life examples of CSV files with
>> repeated column names,
> 
> Sure, but this really isn't the issue.  If it were, "cvs.reader is
> your friend" would be all the answer that the issue deserves IMHO.
> 
>> It seems like we're getting too hung up on dicts:
> 
> Not at all.  (For reasons I don't understand) Somebody has a use case
> where it's useful to have the field names stored in each record,
> rather than stored once and have both field names and field values
> accessed by position as needed.  The point is to return a name-value
> *mapping object* for *each* row, and that may as well be a dict.
> 
> The people who suggest a multidict or a list-valued dict are missing
> that point, AFAICS.  Eg, in your "BLABLA", "VALUE", ..., "VALUE"
> example, position really is what matters, so a dict of any kind is
> inappropriate IMO.  Again, it's arbitrary whether the list-valued dict
> does d["VALUE"].append(x) or d["VALUE"].insert(0,x), and it's hard for
> me to guess which it would do in practice: .append is easier to write,
> but .insert seems closer to the behavior of csv.reader (which is what
> we really want in your example IMO).
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/89d98e30/attachment.html>


More information about the Python-ideas mailing list