How pickle helps in reading huge files?

Roy Smith roy at panix.com
Wed Oct 16 08:29:46 EDT 2013


In article <0044bfd0-f07f-4f7b-b976-5df034b6fec6 at googlegroups.com>,
 Harsh Jha <harshjha2006 at gmail.com> wrote:

> I've a huge csv file and I want to read stuff from it again and again. Is it 
> useful to pickle it and keep and then unpickle it whenever I need to use that 
> data? Is it faster that accessing that file simply by opening it again and 
> again? Please explain, why?
> 
> Thank you.

It can be.  I did a project a bunch of years ago which involved reading 
(and parsing) SNMP MIBs before you could do any work.  Startup took 
something like 10-20 seconds.  If I pre-parsed the MIBs and wrote out 
the data structures as pickles, I could cut startup time to a couple of 
seconds.

But, that's because the parsing I was doing was pretty complicated.  
Parsing a CSV file is much easier, so I wouldn't expect you to have much 
improvement reading a pickle file vs. reading the original CSV.

The bottom line is, you should try it.  Pickling a data structure is 
about one line of code (not counting the 'import cPickle').  Try it and 
see what happens.  Time how long it takes to read the original file, and 
how long it takes to read the pickle.  Let us know your results.

Also, let us know what "huge" means.  1000 rows?  A million?  100 
million?



More information about the Python-list mailing list