Confused about 'positive lookbehind assertion'

Wed Sep 26 06:02:31 EDT 2007

"Andrew Durdin" <adurdin at gmail.com> wrote:

> On 9/25/07, Karthik Gurusamy <kar1107 at gmail.com> wrote:
>>
>> Any idea what this positive lookbehind achieves which can't be done
>> without it.
>> I remember cases where positive look-ahead is useful.
>>
>> In the above example, r.search('abcdef') does the job of ensuring
>> 'def' is preceded by 'abc'.
> 
> AFAICT the only benefit I can see is that the lookbehind isn't
> captured, so (a) it's not included in match.group(n), and (b)
> match.start(), match.end(), and match.span() return the offsets of the
> bit you actually wanted to capture.

It also makes a difference if you are searching for multiple patterns and 
the lookbehind overlaps the previous result:

>>> re.findall('(?<=abc)(abc|def)', 'abcabcdef')
['abc', 'def']

I think though that its real use is not at the beginning of a pattern but 
in the middle: you might have a greedy pattern which matches too easily 
(possibly to create a group), and then a guard which looks behind. e.g. 

>>> re.findall('(\d*)((?<=1)a|(?<=2)b)', '111a 111b 222b')
[('111', 'a'), ('222', 'b')]

This pattern finds all numbers ending in 1 followed by 'a', or ending in 2 
followed by 'b' and it returns groups containing the number and the 
trailing letter. You could write '\d*(1a|2b)' but that doesn't get you the 
correct groups.