python xml DOM? pulldom? SAX?

jog jo at johannageiss.de
Mon Aug 29 11:17:04 EDT 2005


Hi,
I want to get text out of some nodes of a huge xml file (1,5 GB). The
architecture of the xml file is something like this
<parent>
   <page>
    <title>bla</title>
    <id></id>
    <revision>
      <id></id>
      <text>blablabla</text>
    <revision>
   </page>
   <page>
   </page>
    ....
</parent>
I want to combine the text out of page:title and page:revision:text for
every single page element. One by one I want to index these combined
texts (so for each page one index)
What is the most efficient API for that?: SAX ( I don´t thonk so) DOM
or pulldom?
Or should I just use Xpath somehow.
I don`t want to do anything else with his xml file afterwards.
I hope someone will understand me.....
Thank you very much
Jog




More information about the Python-list mailing list