Looping-related Memory Leak

Carl Banks pavlovevidence at gmail.com
Mon Jun 30 20:24:18 EDT 2008


On Jun 30, 1:55 pm, Tom Davis <binju... at gmail.com> wrote:
> On Jun 26, 5:38 am, Carl Banks <pavlovevide... at gmail.com> wrote:
>
>
>
> > On Jun 26, 5:19 am, Tom Davis <binju... at gmail.com> wrote:
>
> > > I am having a problem where a long-running function will cause a
> > > memory leak / balloon for reasons I cannot figure out.  Essentially, I
> > > loop through a directory of pickled files, load them, and run some
> > > other functions on them.  In every case, each function uses only local
> > > variables and I even made sure to use `del` on each variable at the
> > > end of the loop.  However, as the loop progresses the amount of memory
> > > used steadily increases.
>
> > Do you happen to be using a single Unpickler instance?  If so, change
> > it to use a different instance each time.  (If you just use the module-
> > level load function you are already using a different instance each
> > time.)
>
> > Unpicklers hold a reference to everything they've seen, which prevents
> > objects it unpickles from being garbage collected until it is
> > collected itself.
>
> > Carl Banks
>
> Carl,
>
> Yes, I was using the module-level unpickler.  I changed it with little
> effect.  I guess perhaps this is my misunderstanding of how GC works.
> For instance, if I have `a = Obj()` and run `a.some_method()` which
> generates a highly-nested local variable that cannot be easily garbage
> collected, it was my assumption that either (1) completing the method
> call or (2) deleting the object instance itself would automatically
> destroy any variables used by said method.  This does not appear to be
> the case, however.  Even when a variable/object's scope is destroyed,
> it would seem t hat variables/objects created within that scope cannot
> always be reclaimed, depending on their complexity.
>
> To me, this seems illogical.  I can understand that the GC is
> reluctant to reclaim objects that have many connections to other
> objects and so forth, but once those objects' scopes are gone, why
> doesn't it force a reclaim?


Are your objects involved in circular references, and do you have any
objects with a __del__ method?  Normally objects are reclaimed when
the reference count goes to zero, but if there are cycles then the
reference count never reaches zero, and they remain alive until the
generational garbage collector makes a pass to break the cycle.
However, the generational collector doesn't break cycles that involve
objects with a __del__method.

Are you calling any C extensions that might be failing to decref an
object?  There could be a memory leak.

Are you keeping a reference around somewhere.  For example, appending
results to a list, and the result keeps a reference to all of your
unpickled data for some reason.


You know, we can throw out all these scenarios, but these suggestions
are just common pitfalls.  If it doesn't look like one of these
things, you're going to have to do your own legwork to help isolate
what's causing the behavior.  Then if needed you can come back to us
with more detailed information.

Start with your original function, and slowly remove functionality
from it until the bad behavior goes away.  That will give you a clue
what's causing it.


Carl Banks



More information about the Python-list mailing list