re TERRIBLE performance -- progress and request for more help

Skip Montanaro skip at mojam.com
Tue Oct 5 11:45:29 EDT 1999


    russell> The following is a simplification of my original re.  (I have
    russell> omitted symbolic tags, etc.)

    russell> (lit1                       # match opening lit1
    russell>    ( \s+                    #   skip white space
    russell>    | (!=(lit2|lit3)[^\s"]+) #   skip string except lit2 or lit3
    russell>    | ("[^"]*")              #   skip any quoted string
    russell>    )*                       # After any number of above
    russell>    (lit2|lit3) [^\sc]+      # match lit2 or lit3 and target
    russell>    [^c]*                    # match anything except char c
    russell>    c                        # match closing c
    russell> )

    russell> Even though (!= ) is a zero-space recognizer, I had thought it
    russell> would be compiled into something fast, since it was excluding
    russell> potential string matches.  The following revision solved the
    russell> problem:

I don't recognize the (!=...) as being valid re syntax.  In reviewing the re 
syntax page I saw:

    (?=...) 
	  Matches if ... matches next, but doesn't consume any of the
	  string. This is called a lookahead assertion. For example,
	  "Isaac (?=Asimov)" will match 'Isaac ' only if it's followed by
	  'Asimov'. 

    (?!...) 
	  Matches if ... doesn't match next. This is a negative lookahead
	  assertion. For example, "Isaac (?!Asimov)" will match 'Isaac '
	  only if it's not followed by 'Asimov'.

Did your message just contain a transcription error or is that the way you
had it coded in your application?  Or did I miss something reading

    http://www.python.org/doc/lib/re-syntax.html





More information about the Python-list mailing list