cElementTree clear semantics

Sun Sep 25 15:57:45 EDT 2005

[ Fredrik Lundh ]

[ ... ]

> the iterparse/clear approach works best if your XML file has a
> record-like structure. if you have toplevel records with lots of
> schnappi records in them, iterate over the records and use find
> (etc) to locate the subrecords you're interested in: (...)

The problem is that the file looks like this:

<data>
  <schnappi>
    <color>green</color>
    <friends>
      <friend>
        <id>Lama</id>
        <color>white</color>
      </friend>
      <friend>
        <id>mother schnappi</id>
        <color>green</color>
      </friend>
    </friends>
    <food>
      <id>human</id>
      <id>rabbit</id>
    </food>
  </schappi>
  <schnappi>
    <!-- something interesting -->
  </schnappi>
  <!-- 60,000 more schnappis -->
</data>

... and there is really nothing above <schnappi>. The "something
interesting" part consists of a variety of elements, and calling
findall for each of them although possible, would probably be
unpractical (say, distinguishing <friend>'s colors from <schnappi's>).

Conceptually I need a "XML subtree iterator", rather than an XML
element iterator. <schnappi>-elements are the ones having a complex
internal structure, and I'd like to be able to speak of my XML as a
sequence of Python objects representing <schnappi>s and their internal
structure.

[ ... ]

> (I've reorganized the code a bit to cut down on the operations. also
> note the "is" trick; iterparse returns the event strings you pass
> in, so comparing on object identities is safe)

Neat trick.

Thank you for your input,

ivr
-- 
"...but it's HDTV -- it's got a better resolution than the real world."
		                           -- Fry, "When aliens attack"