Does my RE rock, or suck?

Gustavo Niemeyer niemeyer at conectiva.com
Fri Jul 9 15:53:24 EDT 2004


[...]
> In my example I compile the RE:
> 
>     '^(Mary|John( Smith( (Sr|Jr)|)|)|Bob)'
> 
> If the RE compiler is smart, searching should involve scanning
> the longest prefix that could still match, plus looking at just

This won't happen right now. Minimum and maximum lenght is currently
only checked for the whole pattern, not for every branch.

> one character for each branch-point along the way.  (If it uses
> a jump-table at decision points, it should be truly linear.)

OTOH, there's some intelligence for this specific case. The first
character is checked even before "entering" the branch (IOW,
preparing the enviornment for checking the branch), so your
pattern should match really fast.

It will be something close to:

(1) Entering a branch, save current state as necessary.
(2) Check first character, does it match?
(3) If yes, get into the branch, saving state as necessary, and
    checking other characters, subpatterns, etc.
(4) No, jump to next alternative, go back to (2).
(5) Yes, leave the current branch with success, restoring states
    as necessary.

-- 
Gustavo Niemeyer
http://niemeyer.net



More information about the Python-list mailing list