Most efficient method to search text?

Tim Peters tim.one at comcast.net
Wed Oct 16 12:42:41 EDT 2002


[Michael Hudson]
> Here's a way to quickly (I hope! Haven't done any benchmarks) tell if
> one of a bunch of words is contained in a chunk of text, assuming the
> words are known beforehand [...]

[Jeff Epler]
> Is there any reason to suppose that this is more efficient than using
> re.compile("|".join(re.escape(words))).match?

Time it.

> I haven't looked at the implementation of sre, but it should be able to
> generate a simple DFA for this RE and execute something very much like
> your 'match' function, but at C speeds.

re doesn't build a DFA.  Alternatives are searched for one at a time, left
to right.  See Friedl's "Mastering Regular Expression" book for long
discussions of the tradeoffs.





More information about the Python-list mailing list