10GB XML Blows out Memory, Suggestions?
Fredrik Lundh
fredrik at pythonware.com
Wed Jun 7 03:06:03 EDT 2006
axwack at gmail.com wrote:
> Paul,
>
> This is interesting. Unfortunately, I have no control over the XML
> output. The file is from Goldmine. However, you have given me an
> idea...
>
> Is it possible to read an XML document in compressed format?
sure. you can e.g. use gzip.open to create a file object that
decompresses on the way in.
file = gzip.open("data.xml.gz")
for event, elem in ET.iterparse(file):
if elem.tag == "item":
elem.clear()
I tried compressing my 1 GB example, but all 1000-byte records in that
file are identical, so I got a 500x compression, which is a bit higher
than you can reasonably expect ;-) however, with that example, I get a
stable parsing time of 26 seconds, so it looks as if gzip can produce
data about as fast as a preloaded disk cache...
</F>
More information about the Python-list
mailing list