Is anyone happy with csv module?

Bruno Desthuilliers bdesth.quelquechose at free.quelquepart.fr
Tue Dec 11 19:29:15 EST 2007


massimo s. a écrit :
> On 11 Dic, 20:24, "Guilherme Polo" <ggp... at gmail.com> wrote:
> 
> 
>>Post your actual problem so you can get more accurate help.
> 
> 
> Hi Guilhermo,
> I have not an actual problem.

Yes you do - even if you don't realize it yet !-)

> I'm just trying to use the CSV module
> and I mostly can get it working. I just think its interface is much
> less than perfect. I'd like something I can, say, give a whole
> dictionary in input and obtain a CSV file in output, with each key of
> the dictionary being a column in the CSV file. Or a row, if I prefer.
> Something like:
> 
> dict={'First':[1,2,3,4],'Second':[10,20,30,40],'Third':
> [100,200,300,400]}

<ot>
you're shadowing the builtin 'dict' type here, which is usalluy a bad idea
</ot>

> f=open('test.csv','w')
> try:
>     csv_write_dict(f,dict,keys='columns',delimiter=',')
> finally:
>     f.close()
> 
> and obtaining:
> First,Second,Third
> 1,10,100
> 2,20,200
> 3,30,300
> 4,40,400

Doing the needed transformation (from a column:rows dict to the required 
format) is close to trivial. So you could actually implement it 
yourself, monkeypatch the relevant csv class, and submit a patch to the 
maintainer of the module.

FWIW, I never had data structured that way to pass to the csv module - 
to be true, I think I never had a case where tabular data were 
structured by columns.

> Doing the same thing with the current csv module is much more
> cumbersome: see this example from http://www.oreillynet.com/onlamp/blog/2007/08/pymotw_csv.html
> 
> f = open(sys.argv[1], 'wt')
> try:
>     fieldnames = ('Title 1', 'Title 2', 'Title 3')
>     writer = csv.DictWriter(f, fieldnames=fieldnames)
>     headers = {}
>     for n in fieldnames:
>         headers[n] = n
>     writer.writerow(headers)
# same as the 4 lines above
writer.writerow(dict((item, item) for item in fieldnames))

>     for i in range(10):
>         writer.writerow({ 'Title 1':i+1,
>                           'Title 2':chr(ord('a') + i),
>                           'Title 3':'08/%02d/07' % (i+1),
>                           })

This one looks so totally unrealistic to me - I mean, wrt/ to real-life 
use cases - that I won't even propose a rewrite.

> finally:
>     f.close()

A bit of a WTF, indeed. But most of the problem is with this example 
code, not with the csv module (apologies to whoever wrote this snippet).

FWIW, here's a function what you want, at least for your first use case:

def csv_write_cols(writer, data):
     keys = data.keys()
     writer.writerow(dict(zip(keys,keys)))
     for row in zip(*data.values()):
         writer.writerow(dict(zip(keys, row)))

Now you do what you want, but as far as I'm concerned, I wouldn't start 
a total rewrite of an otherwise working (and non-trivial) module just 
for a trivial four (4) lines function.

Also, have you considered that your columns may as well be rows, ie:

First,  1,   2,   3,   4
Second, 10,  20,  30,  40
Third,  100, 200, 300, 400

> 
> Another unrelated quirk I've found is that iterating the rows read by
> a csv reader object seems to erase the rows themselves; I have to copy
> them in another list to use them.

It's not a "quirk", Sir, it's a feature !-)

The csv reader object - like file objects and a couple others - are 
iterators. In this case, it means the csv reader is smart enough to not 
read the whole file into memory - which is not necessarily what you 
want, specially for huge files - but iterating over lines as long as you 
ask for them.

Note that if you need the whole thing in memory, "copying" the rows in a 
list is a no-brainer:
   rows = list(reader)


> Probably it's me not being a professional programmer,

<ot>
Not sure the professional status is key here - I mean, it just mean 
you're getting paid for it, but says nothing about your competences.
</ot>

> so I don't
> understand that somehow the csv module *has* to be done this way. If
> it's so, I'd like to know about it so I can learn something.

As about why it's sometimes better to not read a whole file into memory 
at once, try with multi-gigabytes and watch your system crawl to a halt. 
wrt/ csv being 'row-oriented', fact is that 1/ it's by far the most 
common use case for tabular data and 2/ it's a simple mapping from lines 
to rows (and back) - which is important wrt/ perfs and maintainability. 
Try to read a csv file "by columns", and you'll find out that you'll 
either need to read it all in memory, parse it line by line, then turn 
lines into columns (the inverse operation of my small function above), 
or to rearrange your data the way I suggested above. And let's not talk 
about writing...

Now I don't mean there's no room for improvement in the csv module - 
there almost always is - but given the usefulness of this module in a 
programmer's daily life, it would probably have been superseded by 
something better if it wasn't at least perceived as good enough by it's 
users.

HTH



More information about the Python-list mailing list