[Tutor] regular expression question

Andre Engels andreengels at gmail.com
Tue Apr 28 11:16:28 CEST 2009


2009/4/28 Marek Spociński at go2.pl,Poland <marek_sp at 10g.pl>:
>> Hello,
>>
>> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern so
>> that the return value will be 'abc789jk'? In other words, I want to find the
>> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' are
>> just examples. They are actually quite different in the string that I'm working
>> with.
>>
>> import re
>> s = 'abc123abc45abc789jk'
>> p = r'abc.+jk'
>> lst = re.findall(p, s)
>> print lst[0]
>
> I suggest using r'abc.+?jk' instead.
>
> the additional ? makes the preceeding '.+' non-greedy so instead of matching as long string as it can it matches as short string as possible.

That was my first idea too, but it does not work for this case,
because Python will still try to _start_ the match as soon as
possible. To use .+? one would have to revert the string, then use the
reverse regular expression on the result, which looks like a rather
roundabout way of doing things.



-- 
André Engels, andreengels at gmail.com


More information about the Tutor mailing list