[XML-SIG] Re: Some questions from a beginner

Fredrik Lundh fredrik at pythonware.com
Sun Feb 29 02:27:15 EST 2004


Derek Fountain wrote:

> The truth is you already have your answer. I'll be interested to see if anyone
> else describes a different process, but as far as I am concerned, DOM and SAX
> are the alternatives to choose from. 100MB isn't that much to handle in DOM
> on a modern machine (my desktop has 1GB of RAM so can handle data several
> times that size without swapping), so DOM is a valid option.

what DOM library are you using that only needs a few bytes in
memory for each byte on disk?

(last time I checked, minidom and friends used around 50 bytes
per source byte, on typical samples.  libxml may do a better job,
but it's hard to get under 20 bytes with a Python-based object
model.).


if the source file is relatively structured (e.g. it contains many
thousand records, all having an identical structure), you can use
an incremental DOM parsing approach.


here's an example for the elementtree library:

    http://effbot.org/zone/element-pull.htm

I'm sure you use a similar approach with many other DOM libraries.

</F>






More information about the XML-SIG mailing list