writing large dictionaries to file using cPickle

perfreem at gmail.com perfreem at gmail.com
Wed Jan 28 11:13:10 EST 2009


hello all,

i have a large dictionary which contains about 10 keys, each key has a
value which is a list containing about 1 to 5 million (small)
dictionaries. for example,

mydict = {key1: [{'a': 1, 'b': 2, 'c': 'hello'}, {'d', 3, 'e': 4, 'f':
'world'}, ...],
                key2: [...]}

in total there are about 10 to 15 million lists if we concatenate
together all the values of every key in 'mydict'. mydict is a
structure that represents data in a very large file (about 800
megabytes).

what is the fastest way to pickle 'mydict' into a file? right now i am
experiencing a lot of difficulties with cPickle when using it like
this:

from cPickle import pickle
pfile = open(my_file, 'w')
pickle.dump(mydict, pfile)
pfile.close()

this creates extremely large files (~ 300 MB) though it does so
*extremely* slowly. it writes about 1 megabyte per 5 or 10 seconds and
it gets slower and slower. it takes almost an hour if not more to
write this pickle object to file.

is there any way to speed this up? i dont mind the large file... after
all the text file with the data used to make the dictionary was larger
(~ 800 MB) than the file it eventually creates, which is 300 MB.  but
i do care about speed...

i have tried optimizing this by using this:

s = pickle.dumps(mydict, 2)
pfile.write(s)

but this takes just as long... any ideas ? is there a different module
i could use that's more suitable for large dictionaries ?
thank you very much.



More information about the Python-list mailing list