Iterparse and ElementTree confusion

paul.sherwood at gmail.com paul.sherwood at gmail.com
Wed Aug 17 05:12:14 EDT 2005


Hi

Im trying to parse a large(150MB) xml file in order to extract specific
required records.

import sys
from elementtree.ElementTree import ElementTree

root = ElementTree(file=big.xml')

This works fine for smaller versions of the same xml file but...when i
attempted the above my PC goes to lala land, theres much HDD grinding
followed by "windows runnign low on virtual memory" popup after
10-15mins. Then just more grinding...for an hour before i gave up

XML file format:
<root>
  <rubbish1>
  .
  .
  </rubbish1>
  .
  .
  <rubbishX>
  .
  .
  </rubbishX>
  <Products>
    <Product ID="QU17861" UserTypeID="PH_QUOTE" QualifierID="Qualifier
root" ParentID="LIVE_AREA">
    <Name QualifierID="Qualifier root">23172</Name>
    <Description QualifierID="Qualifier root">Three Spot Rail Light
Brushed Chrome</Description>
    <ClassificationReference ClassificationID="W2 at Kitchen Lighting"
QualifierID="Qualifier root" Type="" />
    <ProductReference ProductID="QU17749" QualifierID="Qualifier root"
Type="Accessory / Linked Product">
    <Name QualifierID="Qualifier root">73520</Name>
    .
    .etc
   </Product>
  </Products>
</root>

Ok, i thought, surely theres a way to parse this thing in chucnks till
i get to the element i require then I'll reuse the ElementTree
goodness.

I found Iterparse

def parse_for_products(filename):

    for event, elem in iterparse(filename):
        if elem.tag == "Products":
            root = ElementTree(elem)
            print_all(root)
        else:
            elem.clear()

My problem is that if i pass the 'elem' found by iterparse then try to
print all attributes, children and tail text i only get
elem.tag....elem.keys returns nothing as do all of the other previously
useful elementtree methods.

Am i right in thinking that you can pass an element into ElementTree?
How might i manually iterate through <product>...</product> grabbing
everything?




More information about the Python-list mailing list