Delete h2 until you reach the next h2 in beautifulsoup

rosefox911 at gmail.com rosefox911 at gmail.com
Sun Nov 6 01:27:24 EDT 2016


Considering the following html:

    <h2 id="example">cool stuff</h2> <ul> <li>hi</li> </ul> <div> <h2 id="cool"><h2> <ul><li>zz</li> </ul> </div>

and the following list:

    ignore_list = ['example','lalala']

My goal is, while going through the HTML using Beautifulsoup, I find a h2 that has an ID that is in my list (ignore_list) I should delete all the ul and lis under it until I find another h2. I would then check if the next h2 was in my ignore list, if it is, delete all the ul and lis until I reach the next h2 (or if there are no h2s left, delete the ul and lis under the current one and stop). 

How I see the process going: you read all the h2s from up to down in the DOM. If the id for any of those is in the ignore_list, then delete all the ul and li under the h2 until you reach the NEXT h2. If there is no h2, then delete the ul and LI then stop.

Here is the full HMTL I am trying to work with: http://pastebin.com/Z3ev9c8N

I am trying to delete all the UL and lis after "See_also"How would I accomplish this in Python?



More information about the Python-list mailing list