Negative Lookahead Problem

Thu Dec 5 13:17:21 EST 2002

On Thu, Dec 05, 2002 at 12:11:54PM +0000, Roger Day wrote:
> The following code btn the dashes:
> ----------------------------------------
> import re
> 
> path = ["/users/dibbl","users/rsrc"]
> print "expect 1st element to be selected"
> m = re.compile(".*(?!rsrc$)")
> for p in path:
>     kk = m.match( p )
>     if kk:
>         print p
> print "expect 2nd element to be selected"
> m = re.compile(".*(?=rsrc$)")
> for p in path:
>     kk = m.match( p )
>     if kk:
>         print p
> ---------------------------------------
> produces this output:
> 
> expect 1st element to be selected
> /users/dibbl
> users/rsrc
> expect 2nd element to be selected
> users/rsrc
> 
> In the first test case, the negative-lookahead "fails"
>  and the pattern selects both elements. It looks like 
> my understanding of negative-lookahead is woefully 
> short of the mark. Can someone please explain the 
> behaviour of negative lookahead in this case?

.*        -> "/users/r"
(?!rsrc$) -> "src"

Other combinations are possible, like "/users/rs" + "rc", but I
believe python RE returns first match.

> Is 
> there another way of selecting, by regular expression, 
> a line which doesn't end in a certain set of characters?

How about non-RE solution?

for p in path:
    if not p.endswith(rsrc):
        print p

Or if you should use RE, then;

skip = re.compile(...) # pattern to skip
for p in path:
    if skip.search(p) is not None:
        # do whatever...

Inyeol...