[Tutor] regular expression question

Kent Johnson kent37 at tds.net
Tue Apr 28 12:36:47 CEST 2009


On Tue, Apr 28, 2009 at 4:03 AM, Kelie <kf9150 at gmail.com> wrote:
> Hello,
>
> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern so
> that the return value will be 'abc789jk'? In other words, I want to find the
> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' are
> just examples. They are actually quite different in the string that I'm working
> with.
>
> import re
> s = 'abc123abc45abc789jk'
> p = r'abc.+jk'
> lst = re.findall(p, s)
> print lst[0]

re.findall() won't work because it finds non-overlapping matches.

If there is a character in the initial match which cannot occur in the
middle section, change .+ to exclude that character. For example,
r'abc[^a]+jk' works with your example.

Another possibility is to look for the match starting at different
locations, something like this:
p = re.compile(r'abc.+jk')
lastMatch = None
i = 0
while i < len(s):
  m = p.search(s, i)
  if m is None:
    break
  lastMatch = m.group()
  i = m.start() + 1

print lastMatch

Kent


More information about the Tutor mailing list