Regex problem

Gustaf Liljegren gustafl at algonet.se
Mon Oct 15 08:47:37 EDT 2001


I'm having a problem with a regex. I'm trying to match <a> or <area>
elements containing the 'href' attribute. Here's the regex:

>>> import re
>>> re_link = re.compile(r'<(a|area)\s+[^>]*href[^>]*/?>', re.I | re.M)

It works fine when I try it on these two strings:

>>> s1 = '<a href="mypage.html">'
>>> re.match(re_link, s1).group()
'<a href="mypage.html">'

>>> s2 = '<area coords="0,0,10,10" href="mypage.html">'
>>> re.match(re_link, s2).group()
'<area coords="0,0,10,10" href="mypage.html">'

But look what happens as soon as I add a space (or any other character)
before:

>>> s3 = ' <a href="mypage.html">'
>>> re.match(re_link, s3).group()
Traceback (most recent call last):
  File "<pyshell#7>", line 1, in ?
    re.match(re_link, s3).group()
AttributeError: 'None' object has no attribute 'group'
>>>

What's wrong here? Matches shouldn't have to start from the beginning of a
string.

Gustaf Liljegreb





More information about the Python-list mailing list