Removing certain tags from html files

Fri Jul 27 14:50:49 EDT 2007

>
> Than take a hold on the content and add it to the parent.  Somthing like
> this should work:
>
> from BeautifulSoup import BeautifulSoup
>
> def remove(soup, tagname):
>     for tag in soup.findAll(tagname):
>         contents = tag.contents
>         parent = tag.parent
>         tag.extract()
>         for tag in contents:
>             parent.append(tag)
>
> def main():
>     source = '<a><b>This is a <c>Test</c></b></a>'
>     soup = BeautifulSoup(source)
>     print soup
>     remove(soup, 'b')
>     print soup
>
> > Is re the good module for that? Basically, if I make an iteration that
> > scans the text and tries to match every occurrence of a given regular
> > expression, would it be a good idea?
>
> No regular expressions are not a very good idea.  They get very
> complicated very quickly while often still miss some corner cases.
>

Thanks a lot for that.

It's true that regular expressions could give me headaches (especially
to find where the tag ends).