Help debugging code - Negative lookahead problem

MRAB python at mrabarnett.plus.com
Sun Feb 26 14:07:10 EST 2017


On 2017-02-26 17:15, michael.gauthier.uni at gmail.com wrote:
> Hi MRAB,
>
> Thanks for taking time to look at my problem!
>
> I tried your solution:
>
> r"\d{2}\s?(?=(?:years old\s?|yo\s?|yr old\s?|y o\s?|yrs  old\s?|year
> old\s?)(?!son|daughter|kid|child))"
>
> but unfortunately it does seem not work. Also, I tried adding the negative lookaheads after every one of the alternatives, but it does not work either, so the problem does not seem to be that the negative lookahead applies only to
> the last proposition... : (
>
> Also, \d{2} will only match two single digits, and won't match the last two digits of 101, so at least this is fine! : )
>
> Any other idea to improve that code? I'm starting to get desperate...
>
> Thanks again for your help anyways, I really appreciate it! ; )
>
Ah, OK. I see what the problem is. (I should've marked it as "untested". 
:-()

It matches r"yo\s?" against "yo " (the r"\s?" consumes the space) and 
then the "son" alternative against "son", but that's a _negative_ 
lookahead, so it _fails_, so it backtracks.

It retries the r"\s?", which now matches an empty string (doesn't 
consume the space), and then the "son" alternative against " son", which 
fails, but that's a _negative_ lookahead, so it _succeeds_.

And the regex as a whole matches.

Ideally I'd want to use a possessive quantifier or atomic group, but 
they aren't supported by the re module, so the workaround is to move the 
check for whitespace:

r"\d{2}\s?(?=(?:years old|yo|yr old|y o|yrs old|year 
old)(?!\s?son|\s?daughter|\s?kid|\s?child))"




More information about the Python-list mailing list