[XML-SIG] Large xml databases and python

Fredrik Lundh fredrik at pythonware.com
Mon Aug 21 14:38:24 CEST 2006


Anstey, Matthew wrote:

> Our question is this: when we finish porting our 300Mb "python" data 
> into 3Gb of XML data, how can we continue to read it from disk in its 
> xml format and manipulate it?
>  
> We are looking at Berkeley XML with the Python API, but are concerned 
> this is not the best solution. we have also dabbled with Amara and 
> ElementTree, but the size our our XML is giving us problems.

if the Python version of the data fits in memory, you can use iterparse 
and the "incremental decoding" approach outlined here:

     http://effbot.org/zone/element-iterparse.htm

to save the data, you can build subtrees (e.g. on a record level) and 
write each tree out by itself.

     f = open("out.xml", "w")
     f.write("<data>")
     for record in data:
         tree = make_record_tree(record)
         tree.write(f)
     f.write("</data>")
     f.close()

</F>



More information about the XML-SIG mailing list