CSV performance

psaffrey at googlemail.com psaffrey at googlemail.com
Mon Apr 27 07:22:24 EDT 2009


I'm using the CSV library to process a large amount of data - 28
files, each of 130MB. Just reading in the data from one file and
filing it into very simple data structures (numpy arrays and a
cstringio) takes around 10 seconds. If I just slurp one file into a
string, it only takes about a second, so I/O is not the bottleneck. Is
it really taking 9 seconds just to split the lines and set the
variables?

Is there some way I can improve the CSV performance? Is there a way I
can slurp the file into memory and read it like a file from there?

Peter



More information about the Python-list mailing list