Looping-related Memory Leak

Tom Davis binjured at gmail.com
Tue Jul 1 09:50:49 EDT 2008


On Jun 30, 8:24 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Jun 30, 1:55 pm, Tom Davis <binju... at gmail.com> wrote:
>
>
>
> > On Jun 26, 5:38 am, Carl Banks <pavlovevide... at gmail.com> wrote:
>
> > > On Jun 26, 5:19 am, Tom Davis <binju... at gmail.com> wrote:
>
> > > > I am having a problem where a long-running function will cause a
> > > > memory leak / balloon for reasons I cannot figure out.  Essentially, I
> > > > loop through a directory of pickled files, load them, and run some
> > > > other functions on them.  In every case, each function uses only local
> > > > variables and I even made sure to use `del` on each variable at the
> > > > end of the loop.  However, as the loop progresses the amount of memory
> > > > used steadily increases.
>
> > > Do you happen to be using a single Unpickler instance?  If so, change
> > > it to use a different instance each time.  (If you just use the module-
> > > level load function you are already using a different instance each
> > > time.)
>
> > > Unpicklers hold a reference to everything they've seen, which prevents
> > > objects it unpickles from being garbage collected until it is
> > > collected itself.
>
> > > Carl Banks
>
> > Carl,
>
> > Yes, I was using the module-level unpickler.  I changed it with little
> > effect.  I guess perhaps this is my misunderstanding of how GC works.
> > For instance, if I have `a = Obj()` and run `a.some_method()` which
> > generates a highly-nested local variable that cannot be easily garbage
> > collected, it was my assumption that either (1) completing the method
> > call or (2) deleting the object instance itself would automatically
> > destroy any variables used by said method.  This does not appear to be
> > the case, however.  Even when a variable/object's scope is destroyed,
> > it would seem t hat variables/objects created within that scope cannot
> > always be reclaimed, depending on their complexity.
>
> > To me, this seems illogical.  I can understand that the GC is
> > reluctant to reclaim objects that have many connections to other
> > objects and so forth, but once those objects' scopes are gone, why
> > doesn't it force a reclaim?
>
> Are your objects involved in circular references, and do you have any
> objects with a __del__ method?  Normally objects are reclaimed when
> the reference count goes to zero, but if there are cycles then the
> reference count never reaches zero, and they remain alive until the
> generational garbage collector makes a pass to break the cycle.
> However, the generational collector doesn't break cycles that involve
> objects with a __del__method.

There are some circular references, but these are produced by objects
created by BeautifulSoup.  I  try to decompose all of them, but if
there's one part of the code to blame it's almost certainly this.  I
have no objects with __del__ methods, at least none that I wrote.

> Are you calling any C extensions that might be failing to decref an
> object?  There could be a memory leak.

Perhaps.  Yet another thing to look into.

> Are you keeping a reference around somewhere.  For example, appending
> results to a list, and the result keeps a reference to all of your
> unpickled data for some reason.

No.

> You know, we can throw out all these scenarios, but these suggestions
> are just common pitfalls.  If it doesn't look like one of these
> things, you're going to have to do your own legwork to help isolate
> what's causing the behavior.  Then if needed you can come back to us
> with more detailed information.
>
> Start with your original function, and slowly remove functionality
> from it until the bad behavior goes away.  That will give you a clue
> what's causing it.

I realize this and thank you folks for your patience.  I thought
perhaps there was something simple I was overlooking, but in this case
it would seem that there are dozens of things outside of my direct
control that could be causing this, most likely from third-party
libraries I am using. I will continue to try to debug this on my own
and see if I can figure anything out.  Memory leaks and failing GC and
so forth are all new concerns for me.

Thanks Again,

Tom




More information about the Python-list mailing list