Memory problem with Python

Mon Jun 18 15:45:30 EDT 2007

Squzer Crawler wrote:
> On Jun 18, 11:06 am, "sor... at gmail.com" <sor... at gmail.com> wrote:
>> On Jun 17, 8:51 pm, Squzer Crawler <Squ... at gmail.com> wrote:
>>
>>> i am developing distributed environment in my college using Python. I
>>> am using therads in client for downloading wepages. Even though i am
>>> reusing the thread, memory usage get increased. I don know why.? I am
>>> using BerkelyDB for URLQueue, BeautifulShop for Parsing the webpages.
>>       Isn't the increased memory resulted from storing the already
>> processed pages?
>>
>>       Look first at all places where your code instantiates new
>> objects - and make sure you don't keep references to such objects that
>> are not needed anymore.
>>
>>       Also, reusing threads has nothing to do with saving memory - but
>> with saving on thread creation time, if I understand your problem
>> description.
> 
> what about the cyclic reference.. can i use GC in my program..
> 
> if so, please tell me how to implement.. i am calling the gc.collect()
> at the enf of the fetching.. Will it reduce my program speed. Else in
> which way i can call it..?

Garbage collection should happen automatically as long as you are 
deleting references to objects you no longer need.  If gc.garbage isn't 
empty, then you have unbreakable reference cycles.  It seems more 
likely, as soring at gmail says, that you are keeping copies of the things 
you already parsed in memory.

What you can do (if you aren't able to find the bug) is have a wrapper 
program that repeatedly starts up your url fetcher via os.system(). 
Then have your url fetcher close itself down every few hours.

  - Josiah