Does my RE rock, or suck?
Bryan Olson
fakeaddress at nowhere.org
Fri Jul 9 03:53:44 EDT 2004
Thanks Michael. I still have more investigating to do.
Michael Hudson wrote:
> Well, the re module is an NFA engine, not a DFA engine, so I'd expect
> this re to run in time proportional to the number of triggers.
I guess I didn't point this out except for my example: I tried
to make the RE efficient even for a search-and-backtrack engine.
Not only are there no loop-like things, there are never multiple
choices that begin with the same character. Any time it
backtracks, it will fail immediately at every choice point.
In my example I compile the RE:
'^(Mary|John( Smith( (Sr|Jr)|)|)|Bob)'
If the RE compiler is smart, searching should involve scanning
the longest prefix that could still match, plus looking at just
one character for each branch-point along the way. (If it uses
a jump-table at decision points, it should be truly linear.)
> "Dozens to hundreds" sounds like a fairly small number in context,
I agree, though hundreds would make the longest RE string I've
ever used.
> but: time it! Only you know what fast enough is.
It seems fast. Testing doesn't allay my fears of a bad worst-
case that an adversary might discover.
> Here's a cute approach that should have running time independent
> of the number of triggers:
[snip]
I've seen that kind of structure called a "trie" (Knuth cites
the name 'trie' to E. Fredkin, CACM 3; 1960). When I generate
my RE, I sort of use the same trie structure. My (trivial)
tests so far indicate that the RE form is faster in typical
cases. Your trie-searcher is nice in that I have much more
confidence in the worst-case run-time.
> Here's a less cute but probably saner approach:
[...]
> this should run in time proportional to the number of different
> lengths of trigger (and use rather less memory than my first version).
I'm not convinced. The time to hash a string is proportional to
the length, so I think searches can take time proportional to
the sum of all the trigger lengths.
[I, Bryan asked:]
>> Am I likely to exceed recursion depth?
>
> Not in 2.4 :-) And I doubt it before that, actually.
But Python is only up to ... oh, the smiley. I get it.
> This thread from a year or so back might interest:
>
>
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=886c5e4b.0210151635.123359bb@posting.google.com
Thanks,
--
--Bryan
More information about the Python-list
mailing list