iterate over a series of nodes in an XML file

Diez B. Roggisch deets at nospam.web.de
Wed Jul 5 12:57:18 EDT 2006


rajarshi.guha at gmail.com wrote:

> Hi, I have an XML file which contains entries of the form:
> 
> <idlist>
>  <myID>1</myID>
>  <myID>2</myID>
> ....
>  <myID>10000</myID>
> </idlist>
> 
> Currently, I have written a SAX based handler that will read in all the
> <myID></myID> entries and return a list of the contents of these
> entries. However this is not scalable and for my purposes it would be
> better if I could iterate over the list of <myID> nodes. Some thing
> like:
> 
> for myid in getMyIDList(document):
>    print myid
> 
> I realize that I can do this with generators, but I can't see how I can
> incorporate generators into my handler class (which is a subclass of
> xml.sax.ContentHandler).
> 
> Any pointers would be appreciated

Use ElementTree. Or one of the other packages that implement its very
pythonic interface, lxml or cElementTree.

Otherwise, you don't have much chances of using SAX to create a generator
besides reading the whole document into memory (which somehow defeats the
purpose of SAX in the first place) or creating a separate thread that
communicates with an iterable over a queue.

Alternatively, there are parsers out there that implement a PULL style of
parsing instead of the PUSH SAX does. Butr before you start with theses -
take ElementTree.

Diez





More information about the Python-list mailing list