Is anyone happy with csv module?
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Tue Dec 11 18:08:21 EST 2007
En Tue, 11 Dec 2007 18:49:27 -0300, massimo s. <devicerandom at gmail.com>
escribi�:
> On 11 Dic, 22:37, John Machin <sjmac... at lexicon.net> wrote:
>> On Dec 12, 6:14 am, "massimo s." <deviceran... at gmail.com> wrote:
>>
>> > Hi,
>>
>> > I'm struggling to use the python in-built csv module, and I must say
>> > I'm less than satisfied. Apart from being rather poorly documented,
>>
>> Patches are welcome :-)
>
> Yes, but maybe I was in the wrong. I'm not so bold to submit patches
> to an official Python module without asking.
> *I* feel troubles, but maybe it's just me being dense.
>
>>
>> > I
>> > find it especially cumbersome to use,
>>
>> Can you be more specific? What are you trying to do with it?
>
> See examples in previous post.
>
>> > and also rather limited.
>>
>> What extra facilities do you think there should be?
>
> Ability to work by columns together with rows and maybe some random
> access facilities would be nice. A more user-friendly interface too.
>
>> A CSV file is organised such that each line of the file represents a
>> row, and the nth field in the line relates to the nth column, so it's
>> natural for any CSV reader/writer to work by rows.
>
> Yes, but it's natural for a spreadsheet-like thing to have organized
> columns of data, often.
> Often I want those columns to be read into lists, or to write lists
> into columns. The actual csv doesn't allow this naturally. Especially
> writing is a bit painful.
>
> I just wanted to know if there was something allowing this with a
> simple command, that I missed, or if just there wasn't.
Expanding on a previous example:
data = [row for row in csv.reader(......)]
col3 = [row[3] for row in data]
Pretty simple, isn't it? If you prefer to use field names instead of
indexes, try with a DictReader instead:
data = [row for row in csv.DictReader(......)]
price = [float(row['PRICE']) for row in data]
Note that all the above (as any operation involving a whole *column*)
requires reading the whole file in memory. Working by rows, on the other
hand, only requires holding ONE row at a time. For big files this is
significant.
An example of writing data given in columns:
id = [1,2,3,4]
name = ['Moe','Larry','Curly','Shemp']
hair = ['black','red',None,'black']
writer = csv.writer(...)
writer.writerows(itertools.izip(id, name, hair))
I think your problem is not with the csv module, but lack of familiarity
with the Python language itself and how to use it efficiently.
> (Btw: who is using csv to read >10**6 lines of data?)
Me, and many others AFAIK. 1M lines is not so big, btw.
--
Gabriel Genellina
More information about the Python-list
mailing list