10GB XML Blows out Memory, Suggestions?

Paul Boddie paul at boddie.org.uk
Wed Jun 7 13:50:21 EDT 2006


gregarican wrote:
> Am I missing something? I don't read where the poster mentioned the
> operation as being CPU intensive. He does mention that the entirety of
> a 10 GB file cannot be loaded into memory. If you discount physical
> swapfile paging and base this assumption on a "normal" PC that might
> have maybe 1 or 2 GB of RAM is his assumption that out of line?

Indeed. The complaint is fairly obvious from the title of the thread.
Now, if the complaint was specifically about the size of the minidom
representation in memory, perhaps a more efficient representation could
be chosen by using another library. Even so, the size of the file being
processed is still likely to be pretty big, considering various
observations and making vague estimates:

http://effbot.org/zone/celementtree.htm

For many people, an XML file of, say, 600MB would still be quite a load
on their "home/small business edition" computer if you had to load the
whole file in and then work on it, even just as a text file. Of course,
approaches where you can avoid keeping a representation of the whole
thing around would be beneficial, and as mentioned previously in a
thread on large XML files, there's always the argument that some kind
of database system should be employed to make querying more efficient
if you can't perform some kind of sequential processing.

Paul




More information about the Python-list mailing list