re.match question

Dan Schmidt dfan at harmonixmusic.com
Thu Oct 5 13:52:21 EDT 2000


Thomas Svensson <thomas.svensson at era.ericsson.se> writes:

| Hello,
| 
| I'm matching a list of strings. I want "xxxBar" to be a match but not
| "xxxFooBar", xxx can be anything. I've tried the following:
| 
| re.match('^.*?(?!Foo)Bar$', word)
| 
| re.match('^.*?(Foo){0}Bar$', word)
|
| But they all returns a match. What am I doing wrong?? I think this
| should be an easy thing to do.

You're being fooled by how you think regexps should work, as opposed
to how they do work.

In your first regexp, ^.*? matches 'xxxFoo', then (?!Foo) ensures that
there is not a 'Foo' coming up next, and then Bar$ matches 'Bar'.

In the second one, ^.*? matches 'xxxFoo', then (Foo){0} matches the
empty string, and then Bar$ matches 'Bar'.

Here's a nasty way that seems to work:

  re.match ('^.*(?!Foo)...Bar$', word)

This way we require three characters before Bar, and before consuming
them, make sure that they're not Foo.  [Oops: this fails for 'xxBar'!
See below.]

You could also do something like '^.*(...)Bar$' and then check what
you captured inside (...) afterwards.

Or do '^.*Bar$', use the start method of the Match object that's
returned to find out what index Bar is at, and check the substring of
three characters before it.

Trying to do it completely within the regexp language is probably the
messiest choice.

As I was about to send this, I looked at the Perl regexp
documentation, and it actually mentions your exact case:

     `/foo(?!bar)/' matches any occurrence of "foo" that isn't followed
     by "bar".  Note however that lookahead and lookbehind are NOT the
     same thing.  You cannot use this for lookbehind: `/(?!foo)bar/' will
     not find an occurrence of "bar" that is preceded by something which
     is not "foo".  That's because the `(?!foo)' is just saying that the
     next thing cannot be "foo"--and it's not, it's a "bar", so "foobar"
     will match.  You would have to do something like `/(?!foo)...bar/'
     for that.  We say "like" because there's the case of your "bar" not
     having three characters before it.  You could cover that this way:
     `/(?:(?!foo)...|^..?)bar/'.  Sometimes it's still easier just to
     say:

          if (/foo/ && $` =~ /bar$/)

-- 
                 Dan Schmidt | http://www.dfan.org
Honest Bob CD now available! | http://www.dfan.org/honestbob/cd.html



More information about the Python-list mailing list