converting text and spans to an ElementTree

Steven Bethard steven.bethard at gmail.com
Wed May 23 16:37:39 EDT 2007


Neil Cerutti wrote:
> On 2007-05-22, Steven Bethard <steven.bethard at gmail.com> wrote:
>> Thanks a lot! This put me on the right track (though the
>> devil's definitely in the details). It's working now::
>>
>>
>>>>> tree = xmltools.text_and_spans_to_etree('aaa aaa aaaccc cccaaa', [
>> ...     (etree.Element('a'), 0, 21),
>> ...     (etree.Element('b'), 11, 11),
>> ...     (etree.Element('c'), 11, 18),
>> ... ])
>>>>> etree.tostring(tree)
>> '<a>aaa aaa aaa<b /><c>ccc ccc</c>aaa</a>'
>>>>> tree = xmltools.text_and_spans_to_etree('bbb\naaaccc\ncccaaa', [
>> ...     (etree.Element('a'), 0, 17),
>> ...     (etree.Element('b'), 0, 4),
>> ...     (etree.Element('c'), 7, 14),
>> ...     (etree.Element('b'), 14, 14),
>> ... ])
>>>>> etree.tostring(tree)
>> '<a><b>bbb\n</b>aaa<c>ccc\nccc</c><b />aaa</a>'
>>>>> tree = xmltools.text_and_spans_to_etree('abc', [
>> ...     (etree.Element('a'), 0, 3),
>> ...     (etree.Element('b'), 0, 3),
>> ...     (etree.Element('c'), 0, 3),
>> ... ])
>>>>> etree.tostring(tree)
>> '<a><b><c>abc</c></b></a>'
>>
>>
>> And for the sake of any poor soul who runs into a similar
>> problem, here's the code I wrote using Gabriel's hints above::
> 
> When I saw you manually keeping a stack, I called Captain
> Recursion on my Red-Alert Recursion Phone.
> 
> (I'm sorry he left out the Element stuff, which he doesn't know
> or use yet. The Captain's get_tree just returns the string)

Heh heh.

I actually thought about writing it recursively, but note that you need 
both recursive and non-recursive parts of this algorithm to do the 
ElementTree part right:

* the recursive (or stack) part assigns children to parents
* the non-recursive part assigns text or tail to the previous element
   (note that's previous in a sequential sense, not a recursive sense)

I'm sure I could implement this recursively, passing around annother 
appropriate argument, but it wasn't obvious to me that the code would be 
any cleaner.

STeVe



More information about the Python-list mailing list