converting text and spans to an ElementTree

Neil Cerutti horpner at yahoo.com
Wed May 23 08:34:14 EDT 2007


On 2007-05-22, Steven Bethard <steven.bethard at gmail.com> wrote:
> Thanks a lot! This put me on the right track (though the
> devil's definitely in the details). It's working now::
>
>
> >>> tree = xmltools.text_and_spans_to_etree('aaa aaa aaaccc cccaaa', [
> ...     (etree.Element('a'), 0, 21),
> ...     (etree.Element('b'), 11, 11),
> ...     (etree.Element('c'), 11, 18),
> ... ])
> >>> etree.tostring(tree)
> '<a>aaa aaa aaa<b /><c>ccc ccc</c>aaa</a>'
> >>> tree = xmltools.text_and_spans_to_etree('bbb\naaaccc\ncccaaa', [
> ...     (etree.Element('a'), 0, 17),
> ...     (etree.Element('b'), 0, 4),
> ...     (etree.Element('c'), 7, 14),
> ...     (etree.Element('b'), 14, 14),
> ... ])
> >>> etree.tostring(tree)
> '<a><b>bbb\n</b>aaa<c>ccc\nccc</c><b />aaa</a>'
> >>> tree = xmltools.text_and_spans_to_etree('abc', [
> ...     (etree.Element('a'), 0, 3),
> ...     (etree.Element('b'), 0, 3),
> ...     (etree.Element('c'), 0, 3),
> ... ])
> >>> etree.tostring(tree)
> '<a><b><c>abc</c></b></a>'
>
>
> And for the sake of any poor soul who runs into a similar
> problem, here's the code I wrote using Gabriel's hints above::

When I saw you manually keeping a stack, I called Captain
Recursion on my Red-Alert Recursion Phone.

(I'm sorry he left out the Element stuff, which he doesn't know
or use yet. The Captain's get_tree just returns the string)

def get_tree(text, spans):
    """
    >>> text = 'aaa aaa aaabbb bbbaaa'
    >>> spans = [
    ...     ('a', 0, 21),
    ...     ('b', 11, 18),
    ...     ('c', 18, 18),
    ... ]
 
    I'd like to produce the corresponding ElementTree. So I want to write a
    get_tree() function that works like::

    >>> get_tree(text, spans)
    '<a>aaa aaa aaa<b>bbb bbb<c /></b>aaa</a>'
    """
    if not spans:
        return ''
    else:
        head, tail = spans[0], spans[1:]
        elem, start, end = head
        if tail:
            _, follow_start, follow_end = tail[0]
        else:
            follow_start, follow_end = (end, end)
        if end > start:
            return ("<%s>%s%s%s</%s>" % 
                    (elem, 
                        text[start:follow_start], 
                        get_tree(text, tail), 
                        text[follow_end:end], 
                        elem))
        else:
            return "<%s />%s" % (elem, get_tree(text, tail)) 

-- 
Neil Cerutti



More information about the Python-list mailing list