[Tutor] regular expression

ingo seedseven@home.nl
Sun, 31 Mar 2002 17:40:24 +0200


>From an HTML-file I want to strip all css related stuff. Using re.sub
looks ideal because in some cases the css has to be replaced by
something else.
The problem I run into is that I can't find a way to match 'class=".."'
with one expression, without matching the string when it is outside a
tag.

in t I don't want to have a match for class="Three"

>>> import re
>>> t=r'<table class="One"><tr><td class="Two"> class="Three" </td>'
>>> pat1=re.compile(r'<.*?class=".*?".*?>')
>>> pat2=re.compile(r'class=".*?"')
>>> p=pat1.search(t)
>>> p=pat2.search(t,p.start(),p.end())
>>> p.group()
'class="One"'
>>> 

Doing it in two steps is possible but now re.sub can't be used. Is
there a way to do it in one go?

Ingo