iterate over a series of nodes in an XML file
Stefan Behnel
stefan.behnel-n05pAM at web.de
Wed Jul 5 15:52:05 EDT 2006
rajarshi.guha at gmail.com wrote:
> I have an XML file which contains entries of the form:
>
> <idlist>
> <myID>1</myID>
> <myID>2</myID>
> ....
> <myID>10000</myID>
> </idlist>
>
> Currently, I have written a SAX based handler that will read in all the
> <myID></myID> entries and return a list of the contents of these
> entries. However this is not scalable and for my purposes it would be
> better if I could iterate over the list of <myID> nodes. Some thing
> like:
>
> for myid in getMyIDList(document):
> print myid
You can try lxml 1.1.
http://cheeseshop.python.org/pypi/lxml/1.1alpha
Some documentation is here:
http://codespeak.net/svn/lxml/trunk/doc/api.txt
I haven't tested it, but you should be able to do this:
from lxml.etree import iterparse
last = None
for event, myid in iterparse(document_url, tag="myID"):
print myid.text
if last is not None:
last.getparent().remove(last)
last = myid
Internally, iterparse builds up a tree, so the last three lines are there to
remove the myid elements from the tree that were already handled. This saves a
lot of memory for large documents.
Stefan
More information about the Python-list
mailing list