re.findall() is skipping matching characters

Gustaf Liljegren gustafl at algonet.se
Mon Oct 15 16:17:45 EDT 2001


Thanks for helping me out with matching/searching before. Unfortunately, 
the example I gave was a little too basic, so I need some more help.

>>> re.search(r'<(a)', '<a href="page.html">').group()
'<a'

The search() function matches the full expression: both the '<' and the 
'(a)', which is short for a alternation between more HTML elements. The 
match() function behaves like this too:

>>> re.match(r'<(a)', '<a href="page.html">').group()
'<a'

But look what happens when I use the findall() function:

>>> re.findall(r'<(a)', '<a href="page.html">')
['a']

Why does findall() skip the '<'? I want to sort out full strings like '<a 
href="page.html">' or '<area ... href="page.html">' and put them in a list. 
I imagine the full regex should look something like this according to 
today's standards:

re_link = re.compile(r'<(a|area)\s[^>]*href[^>]*/?>', re.I | re.M)

Where's the problem?



More information about the Python-list mailing list