Is anyone happy with csv module?

Tue Dec 11 18:08:21 EST 2007

En Tue, 11 Dec 2007 18:49:27 -0300, massimo s. <devicerandom at gmail.com>  
escribi�:

> On 11 Dic, 22:37, John Machin <sjmac... at lexicon.net> wrote:
>> On Dec 12, 6:14 am, "massimo s." <deviceran... at gmail.com> wrote:
>>
>> > Hi,
>>
>> > I'm struggling to use the python in-built csv module, and I must say
>> > I'm less than satisfied. Apart from being rather poorly documented,
>>
>> Patches are welcome :-)
>
> Yes, but maybe I was in the wrong. I'm not so bold to submit patches
> to an official Python module without asking.
> *I* feel troubles, but maybe it's just me being dense.
>
>>
>> > I
>> > find it especially cumbersome to use,
>>
>> Can you be more specific? What are you trying to do with it?
>
> See examples in previous post.
>
>> > and also rather limited.
>>
>> What extra facilities do you think there should be?
>
> Ability to work by columns together with rows and maybe some random
> access facilities would be nice. A more user-friendly interface too.
>
>> A CSV file is organised such that each line of the file represents a
>> row, and the nth field in the line relates to the nth column, so it's
>> natural for any CSV reader/writer to work by rows.
>
> Yes, but it's natural for a spreadsheet-like thing to have organized
> columns of data, often.
> Often I want those columns to be read into lists, or to write lists
> into columns. The actual csv doesn't allow this naturally. Especially
> writing is a bit painful.
>
> I just wanted to know if there was something allowing this with a
> simple command, that I missed, or if just there wasn't.

Expanding on a previous example:

     data = [row for row in csv.reader(......)]
     col3 = [row[3] for row in data]

Pretty simple, isn't it? If you prefer to use field names instead of  
indexes, try with a DictReader instead:

     data = [row for row in csv.DictReader(......)]
     price = [float(row['PRICE']) for row in data]

Note that all the above (as any operation involving a whole *column*)  
requires reading the whole file in memory. Working by rows, on the other  
hand, only requires holding ONE row at a time. For big files this is  
significant.

An example of writing data given in columns:

     id = [1,2,3,4]
     name = ['Moe','Larry','Curly','Shemp']
     hair = ['black','red',None,'black']
     writer = csv.writer(...)
     writer.writerows(itertools.izip(id, name, hair))

I think your problem is not with the csv module, but lack of familiarity  
with the Python language itself and how to use it efficiently.

> (Btw: who is using csv to read >10**6 lines of data?)

Me, and many others AFAIK. 1M lines is not so big, btw.

-- 
Gabriel Genellina