Stripping HTML with RE
Steven Bethard
steven.bethard at gmail.com
Tue Nov 9 18:28:27 EST 2004
I wrote:
> >>> re.sub(r'</?(?!H1|H2|/H1|/H2)[^>]*>', r'', '<a>sdfsa</a>')
> 'sdfsa'
Maybe slightly better:
>>> re.sub(r'<(?!/?(?:H1|H2))[^>]*>', r'', '<a>sdfsa</a>')
'sdfsa'
>>> re.sub(r'<(?!/?(?:H1|H2))[^>]*>', r'', '<H1>sdfsa</a>')
'<H1>sdfsa'
>>> re.sub(r'<(?!/?(?:H1|H2))[^>]*>', r'', '<H1>sdfsa</H2>')
'<H1>sdfsa</H2>'
>>> re.sub(r'<(?!/?(?:H1|H2))[^>]*>', r'', '<H2>sdfsa</H2>')
'<H2>sdfsa</H2>'
I've just grouped things a bit differently so that I only have to write H1 and
H2 once.
Steve
More information about the Python-list
mailing list