Regex mystery: matching <a> or <area>

Richard Jones richard at bizarsoftware.com.au
Mon Oct 15 23:20:31 EDT 2001


On Monday 15 October 2001 11:25, Gustaf Liljegren wrote:
> I'm trying to match HTML <a> or <area> elements with the 'href' attribute.
> For some reason, the regex can't find any match if I put a sample match in
> the context of something else. I only need to add a space before, as in the
> string 's3' below.
>
> >>> re_link = re.compile(r'<(a|area)[^>]+href.*/?>', re.I | re.M)
> >>> s1 = '<a href="mypage.html">'
> >>> s2 = '<area coords="0,0,5,5" href="tiny.html">'
> >>> s3 = ' <a href="space.html">'
> >>> re.match(re_link, s1).group()
>
> '<a href="mypage.html">'
>
> >>> re.match(re_link, s2).group()
>
> '<area coords="0,0,5,5" href="tiny.html">'
>
> >>> re.match(re_link, s3).group()
>
> Traceback (most recent call last):
>   File "<pyshell#49>", line 1, in ?
>     re.match(re_link, s3).group()
> AttributeError: 'None' object has no attribute 'group'


Change "match" to "search" in the above. Note the distinction from the 
library ref.


"""
search(pattern, string[, flags])  Scan through string looking for a location
  where the regular expression pattern produces a match, and return a
  corresponding MatchObject instance. Return None if no position in the
  string matches the pattern; note that this is different from finding a
  zero-length match at some point in the string.

match(pattern, string[, flags])  If zero or more characters at the beginning
  of string match the regular expression pattern, return a corresponding
  MatchObject instance. Return None if the string does not match the pattern;
  note that this is different from a zero-length match. 

Note: If you want to locate a match anywhere in string, use search() instead.
"""


     Richard




More information about the Python-list mailing list