lxml removing tag, keeping text order
Tim Arnold
tim.arnold at sas.com
Mon Oct 27 12:43:01 EDT 2008
"Stefan Behnel" <stefan_ml at behnel.de> wrote in message
news:4902e522$0$17382$9b4e6d93 at newsspool1.arcor-online.net...
> Tim Arnold schrieb:
>> Hi,
>> Using lxml to clean up auto-generated xml to validate against a dtd; I
>> need
>> to remove an element tag but keep the text in order. For example
>> s0 = '''
>> <option>
>> <optional> first text
>> <someelement>ladida</someelement>
>> <emphasis>emphasized text</emphasis>
>> middle text
>> <anotherelement/>
>> last text
>> </optional>
>> </option>'''
>>
>> I want to get rid of the <emphasis> tag but keep everything else as it
>> is;
>> that is, I need this result:
>>
>> <option>
>> <optional> first text
>> <someelement>ladida</someelement>
>> emphasized text
>> middle text
>> <anotherelement/>
>> last text
>> </optional>
>> </option>
>
> There's a drop_tag() method in lxml.html (lxml/html/__init__.py) that does
> what you want. Just copy the code over to your code base and adapt it as
> needed.
>
> Stefan
Thanks Stefan, I was going crazy with this. That method is going to be quite
useful for my project and it's good to learn from too; I was making it too
hard.
thanks,
--Tim Arnold
More information about the Python-list
mailing list