[XML-SIG] Unexpected memory leak in Sax + _locator

Bernhard Herzog bh@intevation.de
07 Feb 2002 12:14:36 +0100


I found an unexpected memory leak in the sax contenthandler +
expatparser (tested with Python 2.2 compiled with expat). The following
script demonstrates the leak:

import gc
gc.set_debug(gc.DEBUG_LEAK)

import xml.sax.handler

xmldata = """<?xml version="1.0" standalone="yes" ?>
<Inquisition>
        <country>Spain</country>
</Inquisition>
"""

class Handler(xml.sax.handler.ContentHandler):

    def endDocument(self):
        # uncomment this to avoid the leak
        #self._locator = None
        pass

xml.sax.parseString(xmldata, Handler())

print "gc.collect() ->", gc.collect()


Typical output:

gc: collectable <dict 0x815c1fc>
gc: collectable <ExpatParser instance at 0x817444c>
gc: collectable <dict 0x816b50c>
gc: collectable <DTDHandler instance at 0x816b59c>
gc: collectable <dict 0x819bdd4>
gc: collectable <EntityResolver instance at 0x816b5c4>
gc: collectable <dict 0x816be04>
gc: collectable <InputSource instance at 0x816b6f4>
gc: collectable <list 0x81253d4>
gc: collectable <Handler instance at 0x814aeec>
gc: collectable <dict 0x812466c>
gc.collect() -> 11



The problem is that the ExpatParser (in xml.sax.expatreader) is both the
parser and locator. The parser has a reference to the contenthandler, of
course. The ExpatParser also passes itself to the handler's
setDocumentLocator method which by default stores it in self._locator,
creating a circular reference.

I can see several ways to fix it:

1. have the parse method call self._cont_handler.setDocumentLocator(None)

   This would be a slight change of the interface, though. Existing
   content handlers might assume that the parameter of
   setDocumentLocator is always a valid locator

2. Create a separate locator class for the expatparser which only
   references the actual parser C-object. The ExpatParser instance could
   also hold a reference to it so that it can be updated on reset()

3. Use a weak reference proxy to the parser as the argument to
   setDocumentLocator.

I could implement one of these and submit a patch. Once you know about
this, it's easily worked around, though, so it might not be worth
fixing, but it should at least be documented.


   Bernhard

-- 
Intevation GmbH                                 http://intevation.de/
Sketch                                 http://sketch.sourceforge.net/
MapIt!                                               http://mapit.de/