Overlapping Regular Expression Matches With findall()

Fredrik Lundh fredrik at pythonware.com
Thu Dec 15 16:46:14 EST 2005


Mystilleef wrote:

> Thanks for your response. I was going by the definition in
> the manual.

"non-overlapping" in that context means that if you e.g. search for "(ba)+"
in the string "bababa", you get one match ("bababa"), not three or six.

in your case, it sounds like you want a search for "ba" to return only one
match.

> I know I can filter the list containing found matches myself, but that
> is somewhat expensive for a list containing thousands of matches.

if the order doesn't matter, you don't have to build a list:

>>> text = "cat catched catnip cat catatonic cat cat cat kat"
>>> set(m.group() for m in re.finditer("cat\w*", text))
set(['catatonic', 'catnip', 'catched', 'cat'])

if you need to preserve the order, you could use a combination of a
list and a set (or a dictionary):

>>> s = set(); w = []
>>> for m in re.finditer("cat\w*", text):
...     m = m.group()
...     if m not in s:
...             s.add(m); w.append(m)
...
>>> w
['cat', 'catched', 'catnip', 'catatonic']

</F>






More information about the Python-list mailing list