[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/datetime picklesize.py,NONE,1.1

Tim Peters tim.one@comcast.net
Tue, 03 Dec 2002 13:52:55 -0500


[Tim]
>> New program just to display pickle sizes.  This makes clear that the
>> copy_reg based C implementation is much more space-efficient in the
>> end than the __getstate__/__setstate__ based Python implementation,
>> but that 4-byte date objects still suffer > 10 bytes of overhead each
>> no matter how many of them you pickle in one gulp.

[Michael Hudson]
> Presumably there's a possibility of an optimization for pickling
> homogeneous (i.e. all the same type) lists (in pickle.py, not here).
>
> Hard to say whether it would be worth it, though.

I don't really care about lists of date objects.  The intent was just to see
how much of the administrative pickle bloat could be saved by pickle's
internal memo facility when multiple date objects appear in a structure for
*whatever* reason (be it a list or tuple of dates, or a dict keyed by dates,
or an object with multiple data-value attributes, or ...).

The administrative pickle bloat for a single date instance is severe:

    pickling 2000-12-13 via Python -- pickle length 80
    pickling 2000-12-13 via C -- pickle length 43

The internal pickle memo saves a lot if more than one data instance appears,
via "remembering" parts of the overhead scaffolding.  But in the end a data
object has a 4-byte state string, and there's still a lot more overhead than
state stored in the pickles.

> ...
> Here's a fairly simple minded patch to the pickling side of pickle.py:
> it seems to save about 6 bytes per object in the good cases.
>
> with:
> list of 100 dates via      C -- 1236 bytes, 12.36 bytes/obj
>
> without:
> list of 100 dates via      C -- 1871 bytes, 18.71 bytes/obj

The "without" number is most curious.  When I run the checked-in code, I get

  list of 100 dates via      C -- 1533 bytes, 15.33 bytes/obj

This was w/ CVS Python, Win2K and MSVC6.  Ah!  You must have fiddled the
script to use pickle instead of cPickle.  I get 1871 bytes then.  I'm
surprised they're so different.

> I'm not going to pursue this further unless someone thinks it's a
> worthwhile move.

I don't personally have large pickles of homogeneous lists, so hard to say.
The C implementation of pickle would also need to be fiddled.  Cutting the
pickle overheads for instances of new-style classes would more clearly be
worthwhile.