a regular expression question
Luke
lrl at cox.net
Sat Mar 22 02:31:38 EST 2003
I suppose this isn't really a python question as much a R.E. question,
but I'm using python to do it, so... I'm trying to parse link data
from a webpage that looks like this:
<a href="foo1">1</a> abc <a href="foo2">2</a> def <a href="foo3">3</a>
ghi <a href="foo4">4</a> jkl
With a regular expression like below (where the variable 'text' is the
sample above), re1 saves the numbers, but not the text. Why is that?
If I use re2, it works, but obviously only gets the odds since there
is no overlapping. Is there a way to modify re1 to get the text, or
is there a way to overlap with python's re engine somehow?
>>> re1 = re.compile("<a .*?>([0-9]+?)</a>(.*?)")
>>> matches = re.findall(re1,text)
>>> matches
[('1', ''), ('2', ''), ('3', ''), ('4', '')]
>>> re2 = re.compile("<a .*?>([0-9]+?)</a>(.*?)<a")
>>> matches = re.findall(re2,text)
>>> matches
[('1', ' abc '), ('3', ' ghi ')]
Thanks
More information about the Python-list
mailing list