Object cleanup

psaffrey at googlemail.com psaffrey at googlemail.com
Wed May 30 12:01:20 EDT 2012


I am writing a screen scraping application using BeautifulSoup:

http://www.crummy.com/software/BeautifulSoup/

(which is fantastic, by the way).

I have an object that has two methods, each of which loads an HTML document and scrapes out some information, putting strings from the HTML documents into lists and dictionaries. I have a set of these objects from which I am aggregating and returning data. 

With a large number of these objects, the memory footprint is very large. The "soup" object is a local variable to each scraping method, so I assumed it would be cleaned up after the method had returned.  However, I've found that using guppy, after the methods have returned most of the memory is being taken up with BeautifulSoup objects of one type or another. I'm not declaring BeautifulSoup objects anywhere else.

I've tried assigning None into the "soup" objects at the end of the method calls and calling garbage collection manually, but this doesn't seem to help. I'd like to find out exactly what object "owns" the various BeautifulSoup structures, but I'm quite a new guppy user and I can't figure out how to do this.

How do I force the memory for these soup objects to be freed? Is there antyhing else I should be looking at to find out the cause of these problems?

Peter



More information about the Python-list mailing list