Searching for Regular Expressions in a string WITH overlap

Ben bmnave at gmail.com
Thu Nov 20 19:31:54 EST 2008


I apologize in advance for the newbie question.  I'm trying to figure
out a way to find all of the occurrences of a regular expression in a
string including the overlapping ones.

For example, given the string 123456789

I'd like to use the RE ((2)|(4))[0-9]{3} to get the following matches:

2345
4567

Here's what I'm trying so far:
<code>
#!/usr/bin/env python

import re, repr, sys

string = "123456789"

pattern = '(((2)|(4))[0-9]{3})'

r1 = re.compile(pattern)

stringList = r1.findall(string)

for string in stringList:
	print "string type is:", type(string)
	print "string is:", string
</code>

Which produces:
<code>
string type is: <type 'tuple'>
string is: ('2345', '2', '2', '')
</code>

I understand that the findall method only returns the non-overlapping
matches.  I just haven't figured out a function that gives me the
matches including the overlap.  Can anyone point me in the right
direction?  I'd also really like to understand why it returns a tuple
and what the '2', '2' refers to.

Thanks for your help!
-Ben



More information about the Python-list mailing list