4DOM eating all my memory

Sun Feb 1 01:34:02 EST 2004

hello all -

I'm looping over a set of urls pulled from a database, fetching the
corresponding webpage, and building a DOM tree for it using
xml.dom.ext.reader.HtmlLib (then trying to match titles in a web library
catalogue). all the trees seem to be kept in memory,

however, when I get through fifty or so iterations the program has used
about half my memory and slowed the system to a crawl.

tried turning on all gc debugging flags. they produce lots of output, but it
all says 'collectable' - sounds fine to me.

I even tried doing gc.collect() at the end of every iteration. nothing.
everything seems to be being collected. so why does each iteration increase
the memory usage by several megabytes?

below is some code (and by the way, do I have those 'global's in the right
places?)

any suggestions would be appreciated immeasurably...
ewan

import MySQLdb

...

cursor = db.cursor()
result = cursor.execute("""SELECT CALLNO, TITLE FROM %s""" % table)
rows = cursor.fetchall()
cursor.close()

for row in rows:
  current_callno = row[0]
  title = row[1]
  url = construct_url(title)
  cf = callno_finder()
  cf.find(title.decode('latin-1'), url)
  ...

(meanwhile, in another file)
...

class callno_finder:
  def __init__(self):
    global root
    root = None

  def find(self, title, uri):
    global root

    reader = HtmlLib.Reader()
    root = reader.fromUri(uri)

    # find what we're looking for
    ...