Does my RE rock, or suck?

Bryan Olson fakeaddress at nowhere.org
Fri Jul 9 03:53:44 EDT 2004


Thanks Michael.  I still have more investigating to do.

Michael Hudson wrote:
 > Well, the re module is an NFA engine, not a DFA engine, so I'd expect
 > this re to run in time proportional to the number of triggers.

I guess I didn't point this out except for my example: I tried
to make the RE efficient even for a search-and-backtrack engine.
Not only are there no loop-like things, there are never multiple
choices that begin with the same character.  Any time it
backtracks, it will fail immediately at every choice point.

In my example I compile the RE:

     '^(Mary|John( Smith( (Sr|Jr)|)|)|Bob)'

If the RE compiler is smart, searching should involve scanning
the longest prefix that could still match, plus looking at just
one character for each branch-point along the way.  (If it uses
a jump-table at decision points, it should be truly linear.)


 > "Dozens to hundreds" sounds like a fairly small number in context,

I agree, though hundreds would make the longest RE string I've
ever used.

 > but: time it!  Only you know what fast enough is.

It seems fast.  Testing doesn't allay my fears of a bad worst-
case that an adversary might discover.


 > Here's a cute approach that should have running time independent
 > of the number of triggers:
[snip]

I've seen that kind of structure called a "trie" (Knuth cites
the name 'trie' to E. Fredkin, CACM 3; 1960).  When I generate
my RE, I sort of use the same trie structure.  My (trivial)
tests so far indicate that the RE form is faster in typical
cases.  Your trie-searcher is nice in that I have much more
confidence in the worst-case run-time.



 > Here's a less cute but probably saner approach:
[...]
 > this should run in time proportional to the number of different
 > lengths of trigger (and use rather less memory than my first version).

I'm not convinced.  The time to hash a string is proportional to
the length, so I think searches can take time proportional to
the sum of all the trigger lengths.


[I, Bryan asked:]
 >>     Am I likely to exceed recursion depth?
 >
 > Not in 2.4 :-)  And I doubt it before that, actually.

But Python is only up to ... oh, the smiley.  I get it.


 > This thread from a year or so back might interest:
 >
 > 
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=886c5e4b.0210151635.123359bb@posting.google.com

Thanks,

-- 
--Bryan



More information about the Python-list mailing list