CSV performance

Tim Chase python.list at tim.thechases.com
Mon Apr 27 08:47:05 EDT 2009


> I'm using the CSV library to process a large amount of data -
> 28 files, each of 130MB. Just reading in the data from one
> file and filing it into very simple data structures (numpy
> arrays and a cstringio) takes around 10 seconds. If I just
> slurp one file into a string, it only takes about a second, so
> I/O is not the bottleneck. Is it really taking 9 seconds just
> to split the lines and set the variables?

You've omitted one important test:  spinning through the file 
with csv-parsing, but not doing an "filing it into very simple 
data structures".  Without that metric, there's no way to know 
whether the csv module is at fault, or if you're doing something 
malperformant with the data-structures.

-tkc






More information about the Python-list mailing list