Remove HTML tags (except anchor tag) from a string using regular expressions

Nico Grubert service at zp-solutions.com
Tue Feb 1 07:03:31 EST 2005


Hello,

I want to remove all html tags from a string "content" except <a 
...>xxx</a>.

My script reads like this:

###
import re
content = re.sub('<([^!>]([^>]|\n)*)>', '', content)
###

It works fine. It removes all html tags from "content".
Unfortunately, this also removes  <a ...>xxx</a> occurancies.
Any idea, how to modify this to remove all html tags except <a ...>xxx</a>?

Thanks in advance,
Nico



More information about the Python-list mailing list