python resource management
Tim Arnold
tim.arnold at sas.com
Mon Jan 19 12:44:52 EST 2009
"Philip Semanchuk" <philip at semanchuk.com> wrote in message
news:mailman.7530.1232375454.3487.python-list at python.org...
>
> On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote:
>
>> Hi all,
>>
>> I am running a python script which parses nearly 22,000 html files
>> locally
>> stored using BeautifulSoup.
>> The problem is the memory usage linearly increases as the files are
>> being
>> parsed.
>> When the script has crossed parsing 200 files or so, it consumes all the
>> available RAM and The CPU usage comes down to 0% (may be due to
>> excessive
>> paging).
>>
>> We tried 'del soup_object' and used 'gc.collect()'. But, no
>> improvement.
>>
>> Please guide me how to limit python's memory-usage or proper method for
>> handling BeautifulSoup object in resource effective manner
>
> You need to figure out where the memory is disappearing. Try commenting
> out parts of your script. For instance, maybe start with a minimalist
> script: open and close the files but don't process them. See if the
> memory usage continues to be a problem. Then add elements back in, making
> your minimalist script more and more like the real one. If the extreme
> memory usage problem is isolated to one component or section, you'll find
> it this way.
>
> HTH
> Philip
Also, are you creating a separate soup object for each file or reusing one
object over and over?
--Tim
More information about the Python-list
mailing list