[XML-SIG] Re: [4suite] memory leak problem 4DOM - update

Uche Ogbuji uche.ogbuji@fourthought.com
Thu, 14 Dec 2000 21:11:24 -0700


We really should share this discussion with XML-SIG.

matt wrote:
> 
> Some answers to my own questions ... but still a problem
> 
> On Fri, 15 Dec 2000, matt wrote:
> > Ok, have done some more experimentation ..... I stepped through everything with
> > pdb, let things settle over a few iterations and then discovered the recuring
> > process.  Some of it is indeed in Py_expat .... in
> > xml/sax/drivers/drv_pyexpat.py to be specific.  The offending lines being the
> > buf = fileobj.read(16384) ones(see function below) ... these chomp 4 kb each
> > time through.  Well, they are not really that offending, they're just loading a
> > buffer like they are supposed to be doing.
> >
> 
> I made the patch.  Looking more carefully the memory gulp comes from
> self.parser.Parse in the parseFile function .....  which confuses me, because I
> made the patch, rebuitl and reinstalled ... including to make sure that all was
> updated :
> i.e. :
> copying xml/dom/ext/Printer.py -> build/lib.linux-i686-1.5/xml/dom/ext  (I had
> found a patch for that too)
> 
> gcc -g -O2 -fpic -DXML_NS -Iextensions/expat/xmltok
> -Iextensions/expat/xmlparse -I/usr/local/include/python1.5 -c extensi
> 
> copying build/lib.linux-i686-1.5/xml/parsers/pyexpat.so ->
> /usr/local/lib/python1.5/site-packages/xml/parsers
> 
> the patch was :
> Index: pyexpat.c
> ===================================================================
> RCS file: /cvsroot/pyxml/xml/extensions/pyexpat.c,v
> retrieving revision 1.16
> diff -u -r1.16 pyexpat.c
> --- pyexpat.c   2000/11/02 04:57:40     1.16
> +++ pyexpat.c   2000/12/05 00:00:33
> @@ -680,6 +680,7 @@
>      for (i=0; handler_info[i].name != NULL; i++) {
>          Py_XDECREF(self->handlers[i]);
>      }
> +    free (self->handlers);
>  #if PY_MAJOR_VERSION == 1 && PY_MINOR_VERSION < 6
>      /* Code for versions before 1.6 */
>      free(self);
> 
> and it indeed did succeed.
> 
> I guess I keep looking.  Anyone find this patch did not help?
> 
> regards
> Matt
> 
> >
> >
> > def parseFile(self,fileobj,sysID=None):
> >         self.reset()
> >         self.sysID=sysID
> >         self.doc_handler.startDocument()
> >
> >         buf = fileobj.read(16384)
> >         while buf != "":
> >             if self.parser.Parse(buf, 0) != 1:
> >                 self.__report_error()
> >             buf = fileobj.read(16384)
> >         self.parser.Parse("", 1)
> >
> >         self.doc_handler.endDocument()
> >
> >
> > So the problem I see is the freeing of this buffer 'buf' : I can only guess a
> > few things :
> > 1) obviously it gets put into the py_expat parser document, which space for
> > that frame gets allocated on the first time through.  Perhaps the py_expat
> > document is not releasing this buffer properly when ext.ReleaseNode(d) calls
> > all the delete nodes.  I haven't looked for anything cirsular there.
> >
> > 2) the fileob.read above is actually doing something weird.  The 4kb seems
> > weird considering it a) reads 16384 bytes, and my file is only 190 bytes, and b)
> > 16384 = 1.64 kb and not 4 kb.
> > 4 kb seems to me the size of some sort of stack frame for a function that never
> > gets released to be used again????
> >
> > Either way, using ext.ReleaseNode(d) did help somewhat, so I would guess that
> > py_expat is to blame somewhere.  I will now go in search of the patch for
> > py_expat and see if this solves the problem overall.
> >
> > to be continued .....
> >
> > Matt
> >
> >
> >
> >
> >
> > On Fri, 15 Dec 2000, Uche Ogbuji wrote:
> > > matt wrote:
> > > >
> > > > Using ext.ReleaseNode(d) helped partially.  On the first iteration through the
> > > > first loop it chomps about 332kb, which I never get back in either case, i.e.
> > > > a) using ext.ReleaseNode(d) or b) not.  After that I get smaller bites, if
> > > > using a) they are 4-12 kb bites, or in b) 16-20 kb bites.  Both methods seem to
> > > > oscillate between two values.  So there was an improvement, i.e approx 8 kb
> > > > improvement with using ext.ReleaseNode(d).  That first jump in both methods is
> > > > a bit of a shock, especially because it never gets given back.  However I had
> > > > the feeling this first jump was just python memory allocation, and that it
> > > > might release it some time later.
> > >
> > > This is pretty common because of Python's dynamic nature.  The first
> > > time in the loop you are importing a wole bunch of modules, which are of
> > > course added to the memory footprint.  After that subsequent imports
> > > don't add to memory.  The little incrementa jumps are probably indeed
> > > memory leaks, so any more info you have tol help us track it down would
> > > be appreciated.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python