Writing big XML files where beginning depends on end.

Magnus Lycka lycka at carmen.se
Thu Nov 24 05:10:20 EST 2005


We're using DOM to create XML files that describes fairly
complex calculations. The XML is structured as a big tree,
where elements in the beginning have values that depend on
other values further down in the tree. Imagine something
like below, but much bigger and much more complex:

<node sum="15">
     <node sum="10">
         <leaf>7</leaf'>
         <node sum="3">
             <leaf>2</leaf'>
             <leaf>1</leaf>
         </node>
     </node>
     <node sum="5">
         <leaf>5</leaf>
     </node>
</node>

We have to stick with this XML structure for now.

In some cases, building up a DOM tree in memory takes up
several GB of RAM, which is a real showstopper. The actual
file is maybe a magnitute smaller than the DOM tree. The
app is using libxml2. It's actually written in C++. Some
library that used much less memory overhead could be
sufficient.

We've thought of writing a file that looks like this...

<node sum="#1">
     <node sum="#1.1">
         <leaf>7</leaf'>
         <node sum="#1.1.1">
             <leaf>2</leaf'>
             <leaf>1</leaf>
         </node>
     </node>
     <node sum="#1.2">
         <leaf>5</leaf>
     </node>
</node>

...and store {"#": "15", "#1.1", "10" ... } in a map
and then read in a piece at a time and performs some
simple change and replace to get the correct values in.
Then we need something that allows parts of the XML file
to be written to file and purged from RAM to avoid the
memory problem.

Suggestions for solutions are appreciated.





More information about the Python-list mailing list