minidom memory performance

Geoff Gerrietts geoff at gerrietts.net
Fri May 30 14:25:15 EDT 2003


Geoff Gerrietts wrote:
>that when I load my own 26kB XML file into a
>minidom tree, my process size grows a little more than 2MB.
>
>By my very rough calculations, that's in the neighborhood of 10000%
>memory consumption. Is there something else that's going on there
>that I don't understand?

I wrote last week about a friend who was having trouble using DOM
implementations in Python, because his 28MB XML file would blow up in
the DOM parser after running through about 40 minutes and 8GB of main
memory. I wrote about my own (unscientific) experiment using minidom,
where passing a 26kB file through xml.dom.minidom.parse swelled my
process size by a bit more than 2MB.

I didn't change what I'm doing, because I don't really mind 2MB of
overhead. But my friend did: he moved (on my advice) to an
architecture using pulldom, which let him use all the DOM features he
was depending on for speed of development, with the serial processing
and low memory overhead of a SAX parser. His process size now does not
exceed 171MB, and it runs to completion on his desktop (not his
production servers!) in 5 minutes.

He says:
  I'm so jazzed, I could make a 'switch' commercial about it ;-)

I would encourage anyone in a similar situation to give pulldom a good
look, but don't bother looking in today's (2.2.2) official python
docs. The coverage there is mostly inaccurate and entirely unusable.

Instead, check out: http://www.prescod.net/python/pulldom.html 

It's worth the time it takes to understand it.

Thanks,
--G.

-- 
Geoff Gerrietts             "That's it! I've had it with your sassy mouth!
<geoff at gerrietts net>     I didn't want to do this! (Well, actually, 
http://www.gerrietts.net/    I did....)"  -- Mojo Jojo, "Bubblevicious"





More information about the Python-list mailing list