elementtree: line numbers and iterparse

Fredrik Lundh fredrik at pythonware.com
Wed Sep 13 00:24:13 EDT 2006


Stuart McGraw wrote:

> I have a broad (~200K nodes) but shallow xml file
> I want to parse with Elementtree.  There are too many 
> nodes to read into memory simultaneously so I use
> iterparse() to process each node sequentially.
> 
> Now I find i need to get and save the input file line 
> number of each node.  Googling turned up a way 
> to do it by subclassing FancyTreeBuilder,
> (http://groups.google.com/group/comp.lang.python/msg/45f5313409553b4b?hl=en&)
> but that tries to read everything at once.
> 
> Is there a way to do something similiar with iterparse()?

something like this could work:

import elementtree.ElementTree as ET
import StringIO

data = """\
<doc>
   <tag>
     <subtag>text</subtag>
     <subtag>text</subtag>
   </tag>
</doc>
"""

class FileWrapper:
     def __init__(self, source):
         self.source = source
         self.lineno = 0
     def read(self, bytes):
         s = self.source.readline()
         self.lineno += 1
         return s

# f = FileWrapper(open("source.xml")
f = FileWrapper(StringIO.StringIO(data))

for event, elem in ET.iterparse(f, events=["start", "end"]):
     if event == "start":
         print f.lineno, event, elem

</F>




More information about the Python-list mailing list