10GB XML Blows out Memory, Suggestions?

Fredrik Lundh fredrik at pythonware.com
Wed Jun 7 03:06:03 EDT 2006


axwack at gmail.com wrote:
> Paul,
> 
> This is interesting. Unfortunately, I have no control over the XML
> output. The file is from Goldmine. However, you have given me an
> idea...
> 
> Is it possible to read an XML document in compressed format?

sure.  you can e.g. use gzip.open to create a file object that 
decompresses on the way in.

     file = gzip.open("data.xml.gz")

     for event, elem in ET.iterparse(file):
         if elem.tag == "item":
             elem.clear()

I tried compressing my 1 GB example, but all 1000-byte records in that 
file are identical, so I got a 500x compression, which is a bit higher 
than you can reasonably expect ;-)  however, with that example, I get a 
stable parsing time of 26 seconds, so it looks as if gzip can produce 
data about as fast as a preloaded disk cache...

</F>




More information about the Python-list mailing list