parsing multiple root element XML into text

Burak Arslan burak.arslan at arskom.com.tr
Fri May 9 12:52:47 EDT 2014


On 05/09/14 16:55, Stefan Behnel wrote:
> ElementTree has gained a nice API in
> Py3.4 that supports this in a much saner way than SAX, using iterators.
> Basically, you just dump in some data that you received and get back an
> iterator over the elements (and their subtrees) that it generated from it.
> Intercept on the right top elements and you get your next subtree as soon
> as it's ready.


Hi Stefan,

Here's a small script:

    events = etree.iterparse(istr, events=("start", "end"))
    stack = deque()
    for event, element in events:
    if event == "start":
    stack.append(element)
    elif event == "end":
    stack.pop()
     
    if len(stack) == 0:
    break
     
    print(istr.tell(), "%5s, %4s, %s" % (event, element.tag, element.text))

where istr is an input-stream. (Fully working example:
https://gist.github.com/plq/025005a71e8135c46800)

I was expecting to have istr.tell() return the position where the first
root element ends, which would make it possible to continue parsing with
another call to etree.iterparse(). But istr.tell() returns the position
of EOF after the first call to next() on the iterator it returns.
Without the stack check, the loop eventually throws an exception and the
offset value in that exception is None.

So I'm lost here, how it'd possible to parse OP's document with lxml?

Best,
Burak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20140509/a92946e7/attachment.html>


More information about the Python-list mailing list