Regex problem

Gustaf Liljegren gustafl at algonet.se
Mon Oct 15 00:05:55 EDT 2001


[Had trouble with the news server tonight. Sorry if you see this message 
more than once.]

I'm trying to match either of the HTML elements <a> or <area>, containing 
an 'href' attribute. Here's the regex I've made:

>>> re_link = re.compile(r'<(area|a)[^>]+href=".*"[^>]*/?>', re.I | re.M)

Works fine when I try it on a matching string:

>>> s1 = '<a href="page.html">'
>>> re.match(re_link, s1).group()
'<a href="page.html">'

But I only need to add a space before, and it won't work. 

>>> s2 = ' <a href="page.html">'
>>> re.match(re_link, s2).group()
Traceback (most recent call last):
  File "<pyshell#20>", line 1, in ?
    re.match(re_link, s2).group()
AttributeError: 'None' object has no attribute 'group'
>>>

Regexes doesn't always have to match from the beginning! What's wrong here?

Gustaf Liljegren



More information about the Python-list mailing list