When to clear elements using cElementTree

Ben Temperton btemperton at gmail.com
Fri Oct 19 17:15:37 EDT 2012


I managed to solve this using the following method:

"""Returns a dictionary of indexes of spectra for which there are secondary scans, along with the indexes of those scans
        """
        scans = dict()

        # get an iterable
        context = cElementTree.iterparse(self.info['filename'], events=("end",))

        # turn it into an iterator
        context = iter(context)

        # get the root element
        event, root = context.next()

        for event, elem in context:
            if event == "end" and elem.tag == self.XML_SPACE + "scan":
                parentId = int(elem.get('num'))
                for child in elem.findall(self.XML_SPACE + 'scan'):
                    childId = int(child.get('num'))
                    try:
                        indexes = scans[parentId]
                    except KeyError:
                        indexes = []
                        scans[parentId] = indexes
                    indexes.append(childId)
                    child.clear()
                root.clear()
        return scans

I think the trick is using the 'end' event to determine how much data your iterparse is taking in, but I'm still not quite clear on whether this is the best way to do it.



More information about the Python-list mailing list