10GB XML Blows out Memory, Suggestions?
Fredrik Lundh
fredrik at pythonware.com
Wed Jun 7 12:30:07 EDT 2006
fuzzylollipop wrote:
> dependes on the CODE and the SIZE of the file, in this case
> processing 10GB of file, unless that file is heavly encrypted or
> compressed will, the process will be IO bound PERIOD!
so the fact that
for token, node in pulldom.parse(file):
pass
is 50-200% slower than
for event, elem in ET.iterparse(file):
if elem.tag == "item":
elem.clear()
when reading a gigabyte-sized XML file, is due to an unexpected slowdown
in the I/O subsystem after importing xml.dom?
> I work with TeraBytes of files, and all our Python code is just as fast
> as equivelent C code for IO bound processes.
so how large are the things that you're actually *processing* in your
Python code? megabyte blobs or 100-1000 byte records? or even smaller
things?
</F>
More information about the Python-list
mailing list