Memory usage steadily going up while pickling objects

Giorgos Tzampanakis giorgos.tzampanakis at gmail.com
Sat Jun 15 05:41:43 EDT 2013


On 2013-06-15, Dave Angel wrote:

> On 06/14/2013 07:04 PM, Giorgos Tzampanakis wrote:
>> I have a program that saves lots (about 800k) objects into a shelve
>> database (I'm using sqlite3dbm for this since all the default python dbm
>> packages seem to be unreliable and effectively unusable, but this is
>> another discussion).
>>
>> The process takes about 10-15 minutes. During that time I see memory usage
>> steadily rising, sometimes resulting in a MemoryError. Now, there is a
>> chance that my code is keeping unneeded references to the stored objects,
>> but I have debugged it thoroughly and haven't found any.
>>
>> So I'm beginning to suspect that the pickle module might be keeping an
>> internal cache of objects being pickled. Is this true?
>>
>
> You can learn quite a bit by using the  sys.getrefcount() function.  If 
> you think a variable has only one reference (if it had none, it'd be 
> very hard to test), and you call sys.getrefcount(), you can check if 
> your assumption is right.
>
> Note that if the object is part of a complex object, there may be 
> several mutual references, so the count may be more than you expect. 
> But you can still check the count before and after calling the pickle 
> stuff, and see if it has increased.
>
> Note that even if it has not, that doesn't prove you don't have a problem.
>
> Could the problem be the sqlite stuff?  Can you disable that part of the 
> logic, and see whether just creating the data still produces the leak?

I tried both with the standard shelve and with sqlite3dbm and
sys.getrefcount() of the stored object (and any of the objects it
references) does not seem to go up after it's stored... I also tried
closing the shelve after storing each object and re-opening it right away
with the "n" flag (which instructs it to start with a new, empty database)
and the memory still rises with the same rate.

So it seems that the pickle module does keep some internal cache or
something like that. I don't want to resort to reading the pickle source
code, but it seems I will have to...

-- 
Real (i.e. statistical) tennis and snooker player rankings and ratings:
http://www.statsfair.com/ 



More information about the Python-list mailing list