Memory problems (garbage collection)

Gerhard Häring gh at ghaering.de
Thu Apr 23 03:25:30 EDT 2009


Carbon Man wrote:
> Very new to Python, running 2.5 on windows.
> I am processing an XML file (7.2MB). Using the standard library I am 
> recursively processing each node and parsing it. The branches don't go 
> particularly deep. What is happening is that the program is running really 
> really slowly, so slow that even running it over night, it still doesn't 
> finish.
> Stepping through it I have noticed that memory usage has shot up from 190MB 
> to 624MB and continues to climb. 

That sounds indeed like a problem in the code. But even if the XML file
is only 7.2 MB the XML structures and what you create out of them have
some overhead.

> If I set a break point and then stop the 
> program the memory is not released. It is not until I shutdown PythonWin 
> that the memory gets released.

Then you're apparently looking at VSIZE or whatever it's called on
Windows. It's the maximum memory the process ever allocated. And this
usually *never* decreases, no matter what the application (Python or
otherwise).

> [GC experiments]

Unless you have circular references, in my experience automatic garbage
collection in Python works fine. I never had to mess with it myself in
10 years of Python usage.

> If I have the program at a break and do gc.collect() it doesn't fix it, so 
> whatever referencing is causing problems is still active.
> My program is parsing the XML and generating a Python program for 
> SQLalchemy, but the program never gets a chance to run the memory problem is 
> prior to that. It probably has something to do with the way I am string 
> building.

Yes, you're apparently concatenating strings. A lot. Don't do that. At
least not this way:

s = ""
s += "something"
s += "else"

instead do this:

from cStringIO import StringIO

s = StringIO()
s.write("something")
s.write("else")
...
s.seek(0)
print s.read()

or

lst = []
lst.append("something")
lst.append("else")
print "".join(lst)


> My apologies for the long post but without being able to see the code I 
> doubt anyone can give me a solid answer so here it goes (sorry for the lack 
> of comments): [...]

Code snipped.

Two tips: Use one of the above methods for concatenating strings. This
is a common problem in Python (and other languages, Java and C# also
have StringBuilder classes because of this).

If you want to speed up your XML processing, use the ElementTree module
in the standard library. It's a lot easier to use and also faster than
what you're using currently. A bonus is it can be swapped out for the
even faster lxml module (externally available, not in the standard
library) by changing a single import for another noticable performance
improvement.

HTH

-- Gerhard




More information about the Python-list mailing list