Memory problems (garbage collection)

Peter Otten __peter__ at web.de
Thu Apr 23 04:13:45 EDT 2009


Carbon Man wrote:

> Very new to Python, running 2.5 on windows.
> I am processing an XML file (7.2MB). Using the standard library I am
> recursively processing each node and parsing it. The branches don't go
> particularly deep. What is happening is that the program is running really
> really slowly, so slow that even running it over night, it still doesn't
> finish.
> Stepping through it I have noticed that memory usage has shot up from
> 190MB to 624MB and continues to climb. If I set a break point and then
> stop the program the memory is not released. It is not until I shutdown
> PythonWin that the memory gets released.
> I thought this might mean objects were not getting GCed, so through the
> interactive window I imported gc. gc.garbage is empty. gc.collect() seems
> to fix the problem (after much thinking) and reports 2524104. Running it
> again returns 0.
> I thought that garbage collection was automatic, if I use variables in a
> method do I have to del them?

No. Deleting a local variable only decreases the reference count. In your
code the next iteration of the for loop or returning from the method have
the same effect and occur directly after your del statements.

> I tried putting a "del node" in all my for node in .... loops but that
> didn't help. collect() reports the same number. Tried putting gc.collect()
> at the end of the loops but that didn't help either.
> If I have the program at a break and do gc.collect() it doesn't fix it, so
> whatever referencing is causing problems is still active.
> My program is parsing the XML and generating a Python program for
> SQLalchemy, but the program never gets a chance to run the memory problem
> is prior to that. It probably has something to do with the way I am string
> building.
> 
> My apologies for the long post but without being able to see the code I
> doubt anyone can give me a solid answer so here it goes (sorry for the
> lack of comments):

First, use a small xml file to check if your program terminates and operates
correctly. Then try disabling cyclic garbage collection with gc.disable().
Remove the gc.collect() calls. 
 
This will not help with the memory footprint, but sometimes when you are
creating many new objects that you want to keep Python spends a lot of time
in vain looking for unreachable objects -- so there may be a speedup.

> from xml.dom import minidom
> import os
> import gc

gc.disable()

[snip more code]

Does this improve things?

Like Gerhard says, in the long run you are probably better off with
ElementTree.

Peter



More information about the Python-list mailing list