converting text and spans to an ElementTree

Neil Cerutti horpner at yahoo.com
Thu May 24 08:49:07 EDT 2007


On 2007-05-23, Steven Bethard <steven.bethard at gmail.com> wrote:
> Neil Cerutti wrote:
>> On 2007-05-22, Steven Bethard <steven.bethard at gmail.com> wrote:
>>> Thanks a lot! This put me on the right track (though the
>>> devil's definitely in the details). It's working now::
>>>
>>>
>>>>>> tree = xmltools.text_and_spans_to_etree('aaa aaa aaaccc cccaaa', [
>>> ...     (etree.Element('a'), 0, 21),
>>> ...     (etree.Element('b'), 11, 11),
>>> ...     (etree.Element('c'), 11, 18),
>>> ... ])
>>>>>> etree.tostring(tree)
>>> '<a>aaa aaa aaa<b /><c>ccc ccc</c>aaa</a>'
>>>>>> tree = xmltools.text_and_spans_to_etree('bbb\naaaccc\ncccaaa', [
>>> ...     (etree.Element('a'), 0, 17),
>>> ...     (etree.Element('b'), 0, 4),
>>> ...     (etree.Element('c'), 7, 14),
>>> ...     (etree.Element('b'), 14, 14),
>>> ... ])
>>>>>> etree.tostring(tree)
>>> '<a><b>bbb\n</b>aaa<c>ccc\nccc</c><b />aaa</a>'
>>>>>> tree = xmltools.text_and_spans_to_etree('abc', [
>>> ...     (etree.Element('a'), 0, 3),
>>> ...     (etree.Element('b'), 0, 3),
>>> ...     (etree.Element('c'), 0, 3),
>>> ... ])
>>>>>> etree.tostring(tree)
>>> '<a><b><c>abc</c></b></a>'
>>>
>>>
>>> And for the sake of any poor soul who runs into a similar
>>> problem, here's the code I wrote using Gabriel's hints above::
>> 
>> When I saw you manually keeping a stack, I called Captain
>> Recursion on my Red-Alert Recursion Phone.
>> 
>> (I'm sorry he left out the Element stuff, which he doesn't know
>> or use yet. The Captain's get_tree just returns the string)
>
> Heh heh.
>
> I actually thought about writing it recursively, but note that
> you need both recursive and non-recursive parts of this
> algorithm to do the ElementTree part right:

You mean... I left out the hard part?  Shucks. I had really hoped
it didn't matter.

> * the recursive (or stack) part assigns children to parents
> * the non-recursive part assigns text or tail to the previous element
>    (note that's previous in a sequential sense, not a recursive sense)
>
> I'm sure I could implement this recursively, passing around
> annother appropriate argument, but it wasn't obvious to me that
> the code would be any cleaner.

Moreover, it looks like you have experience in writing that sort
of code. I'd have never even attempted it without recursion, but
that's merely exposing one of my limitations. ;)

-- 
Neil Cerutti



More information about the Python-list mailing list