Confused about 'positive lookbehind assertion'
Duncan Booth
duncan.booth at invalid.invalid
Wed Sep 26 06:02:31 EDT 2007
"Andrew Durdin" <adurdin at gmail.com> wrote:
> On 9/25/07, Karthik Gurusamy <kar1107 at gmail.com> wrote:
>>
>> Any idea what this positive lookbehind achieves which can't be done
>> without it.
>> I remember cases where positive look-ahead is useful.
>>
>> In the above example, r.search('abcdef') does the job of ensuring
>> 'def' is preceded by 'abc'.
>
> AFAICT the only benefit I can see is that the lookbehind isn't
> captured, so (a) it's not included in match.group(n), and (b)
> match.start(), match.end(), and match.span() return the offsets of the
> bit you actually wanted to capture.
It also makes a difference if you are searching for multiple patterns and
the lookbehind overlaps the previous result:
>>> re.findall('(?<=abc)(abc|def)', 'abcabcdef')
['abc', 'def']
I think though that its real use is not at the beginning of a pattern but
in the middle: you might have a greedy pattern which matches too easily
(possibly to create a group), and then a guard which looks behind. e.g.
>>> re.findall('(\d*)((?<=1)a|(?<=2)b)', '111a 111b 222b')
[('111', 'a'), ('222', 'b')]
This pattern finds all numbers ending in 1 followed by 'a', or ending in 2
followed by 'b' and it returns groups containing the number and the
trailing letter. You could write '\d*(1a|2b)' but that doesn't get you the
correct groups.
More information about the Python-list
mailing list