iterate over a series of nodes in an XML file

Stefan Behnel stefan.behnel-n05pAM at web.de
Wed Jul 5 15:52:05 EDT 2006


rajarshi.guha at gmail.com wrote:
> I have an XML file which contains entries of the form:
> 
> <idlist>
>  <myID>1</myID>
>  <myID>2</myID>
> ....
>  <myID>10000</myID>
> </idlist>
> 
> Currently, I have written a SAX based handler that will read in all the
> <myID></myID> entries and return a list of the contents of these
> entries. However this is not scalable and for my purposes it would be
> better if I could iterate over the list of <myID> nodes. Some thing
> like:
> 
> for myid in getMyIDList(document):
>    print myid

You can try lxml 1.1.

http://cheeseshop.python.org/pypi/lxml/1.1alpha

Some documentation is here:
http://codespeak.net/svn/lxml/trunk/doc/api.txt

I haven't tested it, but you should be able to do this:

  from lxml.etree import iterparse
  last = None
  for event, myid in iterparse(document_url, tag="myID"):
       print myid.text
       if last is not None:
           last.getparent().remove(last)
       last = myid

Internally, iterparse builds up a tree, so the last three lines are there to
remove the myid elements from the tree that were already handled. This saves a
lot of memory for large documents.

Stefan



More information about the Python-list mailing list