10GB XML Blows out Memory, Suggestions?

Fredrik Lundh fredrik at pythonware.com
Tue Jun 6 15:52:44 EDT 2006


gregarican wrote:

> 10 gigs? Wow, even using SAX I would imagine that you would be pushing
> the limits of reasonable performance.

depends on how you define "reasonable", of course.  modern computers are 
quite fast:

 > dir data.xml

2006-06-06  21:35     1 002 000 015 data.xml
                1 File(s)  1 002 000 015 bytes

 > more test.py
from xml.etree import cElementTree as ET
import time

t0 = time.time()

for event, elem in ET.iterparse("data.xml"):
     if elem.tag == "item":
         elem.clear()

print time.time() - t0

gives me timings between 27.1 and 49.1 seconds over 5 runs.

(Intel Dual Core T2300, slow laptop disks, 1000000 XML "item" elements 
averaging 1000 byte each, bundled cElementTree, peak memory usage 33 MB. 
  your milage may vary.)

</F>




More information about the Python-list mailing list