python resource management

Tim Arnold tim.arnold at sas.com
Mon Jan 19 12:44:52 EST 2009


"Philip Semanchuk" <philip at semanchuk.com> wrote in message 
news:mailman.7530.1232375454.3487.python-list at python.org...
>
> On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:
>
>> Hi all,
>>
>> I am running a python script which parses nearly 22,000 html files 
>> locally
>> stored using BeautifulSoup.
>> The problem is the memory usage linearly increases as the files are 
>> being
>> parsed.
>> When the script has crossed parsing 200 files or so, it consumes all  the
>> available RAM and The CPU usage comes down to 0% (may be due to 
>> excessive
>> paging).
>>
>> We tried 'del soup_object'  and used 'gc.collect()'. But, no 
>> improvement.
>>
>> Please guide me how to limit python's memory-usage or proper method  for
>> handling BeautifulSoup object in resource effective manner
>
> You need to figure out where the memory is disappearing. Try  commenting 
> out parts of your script. For instance, maybe start with a  minimalist 
> script: open and close the files but don't process them.  See if the 
> memory usage continues to be a problem. Then add elements  back in, making 
> your minimalist script more and more like the real  one. If the extreme 
> memory usage problem is isolated to one component  or section, you'll find 
> it this way.
>
> HTH
> Philip

Also, are you creating a separate soup object for each file or reusing one 
object over and over?
--Tim





More information about the Python-list mailing list