compressed serialization module

Nick Craig-Wood nick at craig-wood.com
Wed Nov 19 04:30:34 EST 2008


greg <greg at cosc.canterbury.ac.nz> wrote:
>  Nick Craig-Wood wrote:
> > (Note that basic pickle protocol is likely to be more compressible
> > than the binary version!)
> 
>  Although the binary version may be more compact to
>  start with. It would be interesting to compare the
>  two and see which one wins.

It is very data dependent of course, but in this case the binary
version wins...

However there is exactly the same amount of information in the text
pickle and the binary pickle, so in theory a perfect compressor will
compress each to exactly the same size ;-)

>>> import os
>>> import bz2
>>> import pickle
>>> L = range(1000000)
>>> f = bz2.BZ2File("z.dat", "wb")
>>> pickle.dump(L, f)
>>> f.close()
>>> os.path.getsize("z.dat")
1055197L
>>> f = bz2.BZ2File("z1.dat", "wb")
>>> pickle.dump(L, f, -1)
>>> f.close()
>>> os.path.getsize("z1.dat")
524741L
>>>

Practical considerations might be that bz2 is quite CPU expensive.  It
also has quite a large overhead

eg

>>> len("a".encode("bz2"))
37

So if you are compressing lots of small things, zip is a better
protocol

>>> len("a".encode("zip"))
9

It is also much faster!

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick



More information about the Python-list mailing list