Regex problem
andres
andres at corrada.com
Mon Oct 15 09:02:32 EDT 2001
Hi Gustaf,
Matches use the beginning of the line. Use "re.search" to search the
whole string. Alternatively, you could put "\s*" at the beginning of
your match string.
Gustaf Liljegren writes:
> I'm having a problem with a regex. I'm trying to match <a> or <area>
> elements containing the 'href' attribute. Here's the regex:
>
>>>> import re
>>>> re_link = re.compile(r'<(a|area)\s+[^>]*href[^>]*/?>', re.I | re.M)
>
> It works fine when I try it on these two strings:
>
>>>> s1 = '<a href="mypage.html">'
>>>> re.match(re_link, s1).group()
> '<a href="mypage.html">'
>
>>>> s2 = '<area coords="0,0,10,10" href="mypage.html">'
>>>> re.match(re_link, s2).group()
> '<area coords="0,0,10,10" href="mypage.html">'
>
> But look what happens as soon as I add a space (or any other character)
> before:
>
>>>> s3 = ' <a href="mypage.html">'
>>>> re.match(re_link, s3).group()
> Traceback (most recent call last):
> File "<pyshell#7>", line 1, in ?
> re.match(re_link, s3).group()
> AttributeError: 'None' object has no attribute 'group'
>>>>
>
> What's wrong here? Matches shouldn't have to start from the beginning of a
> string.
>
> Gustaf Liljegreb
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list