[Python-ideas] csv.DictReader could handle headers more intelligently.

Mark Hackett mark.hackett at metoffice.gov.uk
Mon Jan 28 13:13:45 CET 2013


On Saturday 26 Jan 2013, Stephen J. Turnbull wrote:
> Shane Green writes:
>  > And while it's true that a dictionary is a dictionary and it works
>  > the way it works, the real point that drives home is that it's an
>  > inappropriate mechanism for dealing ordered rows of sequential
>  > values.
> 
> Right!  So use csv.reader, or csv.DictReader with an explicit
> fieldnames argument.
> 
> The point of csv.DictReader with default fieldnames is to take a
> "well-behaved" table and turn it into a sequence of "poor-man's"
> objects.
> 

Well though there's another example out there of what do do next, I was 
thinking of being able to define the csv file format so that you could write it 
out correctly too.

And to that end, some form of description of the csv file is needed. I was 
thinking something like this:

A,B,C,A,D,E
{(A:2,A:1),B,C,D,E}

which would put columns 4 and 1 in the first entry (under the name A) as a 
list, in that order, followed by B, C, D and E all expected to be single 
unique names.

This also allows the same definition to be used to write it out.

Blank headers are denoted with:

A,,,,,,B,C

And headers not used in the dictionary (discarded) are handled by not being 
put in the "where do we put this" line:
A,B,C,D
{A,D}

When writing out, you cannot have empty headers (since these values get 
dropped and the output format spec is now no longer suitable), and you must 
assign each header a dictionary (else again the dictionary doesn't contain all 
the data that was in the input).

To write out these two types of input file, you need to create a new csv format 
spec which CAN be written out.

Therefore you will have to deliberately define an output that loses data.



More information about the Python-ideas mailing list