Regex problem
Gustaf Liljegren
gustafl at algonet.se
Mon Oct 15 08:47:37 EDT 2001
I'm having a problem with a regex. I'm trying to match <a> or <area>
elements containing the 'href' attribute. Here's the regex:
>>> import re
>>> re_link = re.compile(r'<(a|area)\s+[^>]*href[^>]*/?>', re.I | re.M)
It works fine when I try it on these two strings:
>>> s1 = '<a href="mypage.html">'
>>> re.match(re_link, s1).group()
'<a href="mypage.html">'
>>> s2 = '<area coords="0,0,10,10" href="mypage.html">'
>>> re.match(re_link, s2).group()
'<area coords="0,0,10,10" href="mypage.html">'
But look what happens as soon as I add a space (or any other character)
before:
>>> s3 = ' <a href="mypage.html">'
>>> re.match(re_link, s3).group()
Traceback (most recent call last):
File "<pyshell#7>", line 1, in ?
re.match(re_link, s3).group()
AttributeError: 'None' object has no attribute 'group'
>>>
What's wrong here? Matches shouldn't have to start from the beginning of a
string.
Gustaf Liljegreb
More information about the Python-list
mailing list