minidom memory performance

John Wilson tug at wilson.co.uk
Fri May 23 14:19:47 EDT 2003


----- Original Message ----- 
From: "Geoff Gerrietts" <geoff at gerrietts.net>
To: "Python List" <python-list at python.org>
Sent: Friday, May 23, 2003 5:41 PM
Subject: minidom memory performance


> I was noticing today -- after some talks with a friend who tried to
> use minidom on a 28MB XML file and having it run thru 8GB of main
> memory on Alpha -- that when I load my own 26kB XML file into a
> minidom tree, my process size grows a little more than 2MB.
>
> By my very rough calculations, that's in the neighborhood of 10000%
> memory consumption. Does this seem right to everyone? Is this just the
> overhead that goes with using pure python classes to implement a
> data model that's heavy on classes and light on data? Is there
> something else that's going on there that I don't understand?

I'm not surprised at the size. DOMs are not a great way of dealing with very
large XML files.
Generally large XML files are many repetitions of the same complex element.
It might be possible to do some black magic which builds a minidom instance
for each of these elements, process the instance then junk it and build a
minidom instance for the next element. A better approach is to use SAX and
process the document on the fly.

Can you give us some idea of the document structure and what you want to do
with it?

John Wilson
The Wilson Partnership
http://www.wilson.co.uk






More information about the Python-list mailing list