converting text and spans to an ElementTree

Steven Bethard steven.bethard at gmail.com
Tue May 22 14:34:56 EDT 2007


attn.steven.kuo at gmail.com wrote:
> On May 21, 11:02 pm, Steven Bethard <steven.beth... at gmail.com> wrote:
>> I have some text and a list of Element objects and their offsets, e.g.::
>>
>>      >>> text = 'aaa aaa aaabbb bbbaaa'
>>      >>> spans = [
>>      ...     (etree.Element('a'), 0, 21),
>>      ...     (etree.Element('b'), 11, 18),
>>      ...     (etree.Element('c'), 18, 18),
>>      ... ]
>>
>> I'd like to produce the corresponding ElementTree. So I want to write a
>> get_tree() function that works like::
>>
>>      >>> tree = get_tree(text, spans)
>>      >>> etree.tostring(tree)
>>      '<a>aaa aaa aaa<b>bbb bbb<c /></b>aaa</a>'
>>
>> Perhaps I just need some more sleep, but I can't see an obvious way to
>> do this. Any suggestions?
> 
> It seems you're looking to construct an Interval Tree:
> 
>     http://en.wikipedia.org/wiki/Interval_tree

No, I'm looking to construct an ElementTree from intervals. ;-) Could 
you elaborate on how an Interval Tree would help me?

STeVe



More information about the Python-list mailing list