4DOM eating all my memory
John J. Lee
jjl at pobox.com
Sun Feb 1 19:27:09 EST 2004
ewan <frimn at hotmail.com> writes:
> I'm looping over a set of urls pulled from a database, fetching the
> corresponding webpage, and building a DOM tree for it using
> xml.dom.ext.reader.HtmlLib (then trying to match titles in a web library
> catalogue).
Hmm, if this is open-source and it's more than a quick hack, let me
know when you have it working, I maintain a page on open-source stuff
of this nature (bibliographic and cataloguing).
> all the trees seem to be kept in memory,
>
> however, when I get through fifty or so iterations the program has used
> about half my memory and slowed the system to a crawl.
>
> tried turning on all gc debugging flags. they produce lots of output, but it
> all says 'collectable' - sounds fine to me.
I've never had to resort to this... does it tell you what types /
classes are involved? IIRC, there was some code posted to python-dev
to give hints about this (though I guess that was mostly/always for
debugging leaks at the C level).
> I even tried doing gc.collect() at the end of every iteration. nothing.
> everything seems to be being collected. so why does each iteration increase
> the memory usage by several megabytes?
>
> below is some code (and by the way, do I have those 'global's in the right
> places?)
Yes, they're in the right places. Not sure a global is really needed,
though...
> any suggestions would be appreciated immeasurably...
[...]
> def find(self, title, uri):
> global root
>
> reader = HtmlLib.Reader()
> root = reader.fromUri(uri)
>
> # find what we're looking for
> ...
+ reader.releaseNode(root)
?
John
More information about the Python-list
mailing list