trying to find repeated substrings with regular expression

Raymond Hettinger python at rcn.com
Tue Mar 14 04:13:02 EST 2006


[Robert Dodier]
> I'm trying to find substrings that look like 'FOO blah blah blah'
> in a string. For example give 'blah FOO blah1a blah1b FOO blah2
> FOO blah3a blah3b blah3b' I want to get three substrings,
> 'FOO blah1a blah1b', 'FOO blah2', and 'FOO blah3a blah3b blah3b'.

No need for regular expressions on this one:

>>> s = 'blah FOO blah1a blah1b FOO blah2 FOO blah3a blah3b blah3b'
>>> ['FOO' + tail for tail in s.split('FOO')[1:]]
['FOO blah1a blah1b ', 'FOO blah2 ', 'FOO blah3a blah3b blah3b']


>
> I've tried numerous variations on '.*(FOO((?!FOO).)*)+.*'
> and everything I've tried either matches too much or too little.

The regular expression way is to find the target phrase followed by any
text followed by the target phrase.  The first two are in a group and
the last is not included in the result group.  The any-text section is
non-greedy:

>>> import re
>>> re.findall('(FOO.*?)(?=FOO|$)', s)
['FOO blah1a blah1b ', 'FOO blah2 ', 'FOO blah3a blah3b blah3b']


Raymond




More information about the Python-list mailing list