[Python-Dev] Unpickling memory usage problem, and a proposed solution

Collin Winter collinwinter at google.com
Fri Apr 23 21:07:28 CEST 2010


On Fri, Apr 23, 2010 at 11:53 AM, Collin Winter <collinwinter at google.com> wrote:
> On Fri, Apr 23, 2010 at 11:49 AM, Alexandre Vassalotti
> <alexandre at peadrop.com> wrote:
>> On Fri, Apr 23, 2010 at 2:38 PM, Alexandre Vassalotti
>> <alexandre at peadrop.com> wrote:
>>> Collin Winter wrote a simple optimization pass for cPickle in Unladen
>>> Swallow [1]. The code reads through the stream and remove all the
>>> unnecessary PUTs in-place.
>>>
>>
>> I just noticed the code removes *all* PUT opcodes, regardless if they
>> are needed or not. So, this code can only be used if there's no GET in
>> the stream (which is unlikely for a large stream). I believe Collin
>> made this trade-off for performance reasons. However, it wouldn't be
>> hard to make the current code to work like pickletools.optimize().
>
> The optimization pass is only run if you don't use any GETs. The
> optimization is also disabled if you're writing to a file-like object.
> These tradeoffs were appropriate for the workload I was optimizing
> against.

I should add that, adding the necessary bookkeeping to remove only
unused PUTs (instead of the current all-or-nothing scheme) should not
be hard. I'd watch out for a further performance/memory hit; the
pickling benchmarks in the benchmark suite should help assess this.
The current optimization penalizes pickling to speed up unpickling,
which made sense when optimizing pickles that would go into memcache
and be read out 13-15x more often than they were written.

Collin Winter


More information about the Python-Dev mailing list