Most efficient method to search text?

Jeff Epler jepler at unpythonic.net
Wed Oct 16 10:52:06 EDT 2002


On Wed, Oct 16, 2002 at 01:35:09PM +0000, Michael Hudson wrote:
> Here's a way to quickly (I hope! Haven't done any benchmarks) tell if
> one of a bunch of words is contained in a chunk of text, assuming the
> words are known beforehand [...]

Is there any reason to suppose that this is more efficient than using
re.compile("|".join(re.escape(words))).match?  I haven't looked at the
implementation of sre, but it should be able to generate a simple DFA
for this RE and execute something very much like your 'match' function,
but at C speeds.

Having one dict lookup per character scanned seems like a pretty huge
chunk of overhead.

Jeff




More information about the Python-list mailing list