pickle.load() extremely slow performance
Carl Banks
pavlovevidence at gmail.com
Fri Mar 20 22:30:07 EDT 2009
On Mar 20, 5:26 pm, Jim Garrison <j... at acm.org> wrote:
> John Machin wrote:
> > On Mar 21, 9:25 am, Jim Garrison <j... at acm.org> wrote:
> >> I'm converting a Perl system to Python, and have run into a severe
> >> performance problem with pickle.
>
> >> One facet of the system involves scanning and loading into memory a
> >> couple of parallel directory trees containing OTO 10^4 files. The
> >> trees don't change during development/testing and the scan takes 30-40
> >> seconds, so to save time I cache the loaded tree structure to disk, in
> >> Perl with module Storable, and in Python with pickle.
>
> >> In Perl, the save operation produces a file of about 3MB, and both
> >> save and restore take a second or two. In Python, pickle.dump()
> >> produces a similar-size file but takes 20 seconds, and pickle.load()
> >> takes 45 seconds, which is actually LONGER than the time required to
> >> scan the directory trees.
>
> >> Is there anything I can do to speed up pickle.load() to get
> >> performance comparable to Perl's Storable?
>
> > Have you read this:
> > http://www.python.org/doc/2.6/library/pickle.html
> > ?
> > Have you considered using cPickle instead of pickle?
> > Have you considered using *ickle.dump(..., protocol=-1) ?
>
> I'm using Python 3 on Windows (Server 2003). According to the docs
>
> "The pickle module has an transparent optimizer (_pickle) written
> in C. It is used whenever available. Otherwise the pure Python
> implementation is used."
>
> How can I tell if _pickle is being used?
The slow performance is most likely due to the poor performance of
Python 3's IO, which is caused by (among other things) bad buffering
strategy. It's a Python 3 growing pain, and is being rewritten.
Python 3.1 should be must faster but it's not been released yet.
As a workaround, mmap the file instead. For example (untested):
f = open('dirlisting.dat','rb')
try:
f.seek(0,2)
size = f.tell()
f.seek(0,0)
m = mmap.mmap(f.fileno(),size,access=mmap.ACCESS_READ)
try:
dir_listing = pickle.loads(m)
finally:
m.close()
finally:
f.close()
Pickling the output left as an exercise.
Carl Banks
More information about the Python-list
mailing list