Regex - where do I make a mistake?

Peter Otten __peter__ at web.de
Fri Feb 16 08:14:11 EST 2007


Johny wrote:

> I have
> string="""<span class="test456">55</span>.
> <td><span class="test123">128</span>
> <span class="test789">170</span>
> """
> 
> where I need to replace
> <span class="test456">55</span>.
> <span class="test789">170</span>
> 
> by space.
> So I tried
> 
> #############
> import re
> string="""<td><span class="test456">55</span>.<span
> class="test123">128</span><span class="test789">170</span>
> """
> Newstring=re.sub(r'<span class="test(?!123)">.*</span>'," ",string)
> ###########
> 
> But it does NOT work.
> Can anyone explain why?

"(?!123)" is a negative "lookahead assertion", i. e. it ensures that "test"
is not followed by "123", but /doesn't/ consume any characters. For your
regex to match "test" must be /immediately/ followed by a '"'.

Regular expressions are too lowlevel to use on HTML directly. Go with
BeautifulSoup instead of trying to fix the above.

Peter



More information about the Python-list mailing list