lxml removing tag, keeping text order

Tim Arnold tim.arnold at sas.com
Mon Oct 27 12:43:01 EDT 2008


"Stefan Behnel" <stefan_ml at behnel.de> wrote in message 
news:4902e522$0$17382$9b4e6d93 at newsspool1.arcor-online.net...
> Tim Arnold schrieb:
>> Hi,
>> Using lxml to clean up auto-generated xml to validate against a dtd; I 
>> need
>> to remove an element tag but keep the text in order. For example
>> s0 = '''
>> <option>
>>   <optional> first text
>>     <someelement>ladida</someelement>
>>     <emphasis>emphasized text</emphasis>
>>     middle text
>>     <anotherelement/>
>>     last text
>>   </optional>
>> </option>'''
>>
>> I want to get rid of the <emphasis> tag but keep everything else as it 
>> is;
>> that is, I need this result:
>>
>> <option>
>>   <optional> first text
>>     <someelement>ladida</someelement>
>>     emphasized text
>>     middle text
>>     <anotherelement/>
>>     last text
>>   </optional>
>> </option>
>
> There's a drop_tag() method in lxml.html (lxml/html/__init__.py) that does
> what you want. Just copy the code over to your code base and adapt it as 
> needed.
>
> Stefan
Thanks Stefan, I was going crazy with this. That method is going to be quite 
useful for my project and it's good to learn from too; I was making it too 
hard.

thanks,
--Tim Arnold 





More information about the Python-list mailing list