minidom memory performance

Gilles Lenfant glenfant at NOSPAM.bigfoot.com
Fri May 23 16:20:45 EDT 2003


"Geoff Gerrietts" <geoff at gerrietts.net> a écrit dans le message de news:
mailman.1053708216.3330.python-list at python.org...
> I was noticing today -- after some talks with a friend who tried to
> use minidom on a 28MB XML file and having it run thru 8GB of main
> memory on Alpha -- that when I load my own 26kB XML file into a
> minidom tree, my process size grows a little more than 2MB.
>
> By my very rough calculations, that's in the neighborhood of 10000%
> memory consumption. Does this seem right to everyone? Is this just the
> overhead that goes with using pure python classes to implement a
> data model that's heavy on classes and light on data? Is there
> something else that's going on there that I don't understand?

Never use DOM to parse such a giant XML file.
In addition, the memory factor of minidom isn't very good (note that it's
worse with 4DOM).
Use SAX, to split your doc in small parts and then use DOM if required on
elements of interest.
There are some tips at activestate.com on using DOM trees on elements
handled from SAX.

There is a comparison table on DOM parsers performances (speed, memory
factor...) here.

http://www.reportlab.com/xml/pyrxp.html

You can notice that minidom has bad performaces in memory factor. Anyway,
even with the DOM parser with the best memory factor (MSXML), don't play
100% with DOM on such giant files.

--Gilles






More information about the Python-list mailing list