pickle performance on larger objects
Sam Penrose
spenrose at intersight.com
Wed Jul 17 17:09:07 EDT 2002
On a recent project we decided to use pickle for some quick-and-dirty
object persistence. The object in question is a list of 3,000
dictionaries
whose keys and values are short (< 100 character) strings--about 1.5
megs worth of character data in total. Loading this object from a pickle
using cPickle took so long we assumed something was broken.
In fact, loading is just slow. A list of 10,000 identical dictionaries
whose keys and values are short strings takes many seconds to load on
modern hardware. Some details:
i. A python process which is loading a pickle will use a lot of RAM
relative to the pickle's size on disk, roughly an order of
magnitude more on Mac OS X.
ii. Performance appears to scale linearly with changes in the size of
the list or its dicts until you run out of RAM.
iii.Python pickle is only about 5x slower than cPickle as the list
gets long, except that it uses more RAM and therefore hits a big
RAM-to-diskswap performance falloff sooner.
iv. You *can* tell a Mac's performance by its MHz. An 800 MHz PIII
running Windows is almost exactly twice as fast as a 400 MHz G4
running Mac OS X, both executing the following code from the
command line. With 25 items in the dictionaries and 10K dicts
used, the former took just under a minute using cPickle, the
latter two minutes.
v. Generating a list of 3K heterogeneous dicts of 25 items (our real
data) by reading in a 750k text file and splitting it up takes on
the order of a second.
Sample run on 400 MHz G4, 448 megs of RAM:
>>> time_cPickle_Load()
dumping list of 10 dicts:
dumped: 0.00518298149109
loading list of 10 dicts:
loaded: 0.1170129776
dumping list of 100 dicts:
dumped: 0.0329120159149
loading list of 100 dicts:
loaded: 0.849031090736
dumping list of 1000 dicts:
dumped: 0.397919893265
loading list of 1000 dicts:
loaded: 8.18722295761
dumping list of 10000 dicts:
dumped: 4.42434895039
loading list of 10000 dicts:
loaded: 133.906162977
#---code follows----------------//
def makeDict(numItems=25):
d = {}
for i in range(numItems):
k = 'key%s' % i
v = 'value%s' % i
d[k] = v
return d
def time_cPickle_Load():
import time
now = time.time
from cPickle import dump, load
filename = 'deleteme.pkl'
for i in (10, 100, 1000, 10000):
data = [makeDict() for j in range(i)]
output = open(filename, 'w')
startDump = now()
print "dumping list of %s dicts:" % i
dump(data, output)
print "dumped:", now() - startDump
output.close()
input = open(filename)
startLoad = now()
print "loading list of %s dicts:" % i
x = load(input)
print "loaded:", now() - startLoad
input.close()
More information about the Python-list
mailing list