regular expression negate a word (not character)

Mark Tolonen mark.e.tolonen at mailinator.com
Fri Jan 25 23:40:23 EST 2008


"Summercool" <Summercoolness at gmail.com> wrote in message 
news:27249159-9ff3-4887-acb7-99cf0d2582a8 at n20g2000hsh.googlegroups.com...
>
> somebody who is a regular expression guru... how do you negate a word
> and grep for all words that is
>
>  tire
>
> but not
>
>  snow tire
>
> or
>
>  snowtire
>
> so for example, it will grep for
>
>  winter tire
>  tire
>  retire
>  tired
>
> but will not grep for
>
>  snow tire
>  snow   tire
>  some snowtires
>
> need to do it in one regular expression
>

What you want is a negative lookbehind assertion:

>>> re.search(r'(?<!snow)tire','snowtire')  # no match
>>> re.search(r'(?<!snow)tire','baldtire')
<_sre.SRE_Match object at 0x00FCD608>

Unfortunately you want variable whitespace:

>>> re.search(r'(?<!snow\s*)tire','snow tire')
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "C:\dev\python\lib\re.py", line 134, in search
    return _compile(pattern, flags).search(string)
  File "C:\dev\python\lib\re.py", line 233, in _compile
    raise error, v # invalid expression
error: look-behind requires fixed-width pattern
>>>

Python doesn't support lookbehind assertions that can vary in size.  This 
doesn't work either:

>>> re.search(r'(?<!snow)\s*tire','snow tire')
<_sre.SRE_Match object at 0x00F93480>

Here's some code (not heavily tested) that implements a variable lookbehind 
assertion, and a function to mark matches in a string to demonstrate it:

### BEGIN CODE ###

import re

def finditerexcept(pattern,notpattern,string):
    for matchobj in 
re.finditer('(?:%s)|(?:%s)'%(notpattern,pattern),string):
        if not re.match(notpattern,matchobj.group()):
            yield matchobj

def markexcept(pattern,notpattern,string):
    substrings = []
    current = 0

    for matchobj in finditerexcept(pattern,notpattern,string):
        substrings.append(string[current:matchobj.start()])
        substrings.append('[' + matchobj.group() + ']')
        current = matchobj.end() #

    substrings.append(string[current:])
    return ''.join(substrings)

### END CODE ###

>>> sample='''winter tire
... tire
... retire
... tired
... snow tire
... snow    tire
... some snowtires
... '''
>>> print markexcept('tire','snow\s*tire',sample)
winter [tire]
[tire]
re[tire]
[tire]d
snow tire
snow    tire
some snowtires

--Mark




More information about the Python-list mailing list