converting text and spans to an ElementTree

Neil Cerutti horpner at yahoo.com
Thu May 24 15:52:29 EDT 2007


On 2007-05-24, Neil Cerutti <horpner at yahoo.com> wrote:
> On 2007-05-23, Steven Bethard <steven.bethard at gmail.com> wrote:
> You mean... I left out the hard part?  Shucks. I had really
> hoped it didn't matter.
>
>> * the recursive (or stack) part assigns children to parents
>> * the non-recursive part assigns text or tail to the previous element
>>    (note that's previous in a sequential sense, not a recursive sense)
>>
>> I'm sure I could implement this recursively, passing around
>> annother appropriate argument, but it wasn't obvious to me
>> that the code would be any cleaner.
>
> Moreover, it looks like you have experience in writing that
> sort of code. I'd have never even attempted it without
> recursion, but that's merely exposing one of my limitations. ;)

You'll be happy to know I found a way to salvage my simple
recursive solution, and make it generate an ElementTree!

def get_tree(text, spans):
    """
    >>> text = 'aaa aaa aaabbb bbbaaa'
    >>> spans = [
    ...     (etree.Element('a'), 0, 21),
    ...     (etree.Element('b'), 11, 18),
    ...     (etree.Element('c'), 18, 18),
    ... ]
 
    I'd like to produce the corresponding ElementTree. So I want to write a
    get_tree() function that works like::

    >>> etree.tostring(get_tree(text, spans))
    '<a>aaa aaa aaa<b>bbb bbb<c /></b>aaa</a>'
    """
    def helper(text, spans):
        if not spans:
            return ''
        else:
            head, tail = spans[0], spans[1:]
            elem, start, end = head
            if tail:
                _, follow_start, follow_end = tail[0]
            else:
                follow_start, follow_end = (end, end)
            if end > start:
                return ("<%s>%s%s%s</%s>" % 
                        (elem.tag, 
                            text[start:follow_start], 
                            helper(text, tail), 
                            text[follow_end:end], 
                            elem.tag))
            else:
                return "<%s />%s" % (elem.tag, helper(text, tail)) 
    return etree.XML(helper(text, spans))

But at least I learned just a *little* about XML and Python
during this arduous process. ;)

-- 
Neil Cerutti
The concert held in Fellowship Hall was a great success. Special thanks are
due to the minister's daughter, who labored the whole evening at the piano,
which as usual fell upon her. --Church Bulletin Blooper



More information about the Python-list mailing list