busting-out XML sections

Alex Martelli aleaxit at yahoo.com
Fri Oct 6 18:11:52 EDT 2000


"Thomas Gagne" <tgagne at ix.netcom.com> wrote in message
news:39DE24AE.71AD86C1 at ix.netcom.com...
> I thought about using DOM, until I considered the size of the input files.
If
> there was a way to position the file so that you could essentially read
> documents out of the file this would work wonderfully.  Since DOM (as I
> understand) will build the entire tree in memory, it makes processing
> multi-megabyte files unadviseable.  Even given a system with a lot of
memory
> it is wrong to assume you have complete use of the system's memory.

An XML DOM implementation doesn't *have* to build a whole tree in memory: it
*may*, but a clever implementation might well choose to use another strategy
when it discovers the incoming XML file is really huge.  E.g., if the XML
file is
in a file that's readable and randomly accessible, the DOM implementation
might
do a single pass over it (possibly to validate the XML file, but
particularly) to
determine the key pointers/bookmarks within the XML file, then use those
bookmarks and a modest amount of memory and reprocessing to perform all
of the DOM functionality.

I think it's wrong to assume that the implementation of crucial
infrastructure
such as XML DOM analysis is defective, just because it *might* be and thus
cause problems if you ever have to process a gigabyte-size input file.  Why
not "do the simplest thing you can possibly get away with", as XP teaches?
Using an XML DOM approach is often simplest.  Use it, and see if you can't
get away with it.  If and when the XML DOM implementation you have proves
defective, upgrade to a better one, and, *poof*, problem disappears again...


Alex






More information about the Python-list mailing list