pickle vs .pyc

Michael Hudson mwh21 at cam.ac.uk
Wed Jun 2 16:57:06 EDT 1999


Michael Vezie <mlv at pobox.com> writes:

> I need to be able to read a couple very complex (dictionary of arrays 
> of dictionaries, and array of dictionaries of array of dictionaries) 
> data structures into python.  To generate it by hand takes too long, 
> so I want to generate it once, and read it each time (the data doesn't 
> change).
> 
> The obvious choice is, of course pickle, or some flavor thereof.
> But can someone tell me why this wouldn't be faster:
> 
> In the code that does the "pickling", simply do:
> f = open("cache.py", "w")
> f.write("# cache file for fast,slow\n")
> f.write("fast = "+`fast`+'\n')
> f.write("slow = "+`slow'+'\n')
> f.close()
> import cache
> 
> Then, later, when I want the data, I just do:
> 
> from cache import fast,slow
> 
> and it's right there.  It's compiled, and seems really fast (loading a 
> 50k file in .12 seconds).  I just tried the same data using cPickle, and 
> it took 1.4 seconds.  It's also not as portable.  There is a space savings 
> with pickle, but it's only 5% (well, 56% if you count both the .py and 
> .pyc files), but that doesn't really matter to me.
> 
> Am I missing something here?  This sounds like an obvious, and fast, 
> way to do things.  True, the caching part may take longer.  But I 
> really don't care about that, since it's done only once, and in the 
> background.  
> 
> Michael

Hmm, you're relying on all the data you're storing having faithful
__repr__ methods. This certainly isn't universally true. I'd regard
this method as too fragile.

If you're only storing simple data (by which I mean simple types of
data, not that the data is simple) (and I think you must be for the
approach you're using to work) give the marshal module a whirl.

I think it will be substantially faster than your repr-based method
(cryptic hint: if it wasn't, the marshal module probably wouldn't
exist).

Eg:

import marshal

complex_data_structure = {'key1':['nested list'],9:"mixed types"}

marshal.dump(complex_data_structure,open('/tmp/foo','w'))

print marshal.load(open('/tmp/foo'))

HTH
Michael

Random aside: something fishy's going on when I try to try to marshal
*arrays* (as opposed to mere lists):

>>> import array,marshal
>>> marshal.loads(marshal.dumps(array.array('f',[0,1])))
'\000\000\000\000\000\000\200?'
>>> 

That shouldn't be happening should it? Surely that should be raising
an unmarshalable object exception? Oh well...






More information about the Python-list mailing list