Most efficient method to search text?
Jeff Epler
jepler at unpythonic.net
Wed Oct 16 10:52:06 EDT 2002
On Wed, Oct 16, 2002 at 01:35:09PM +0000, Michael Hudson wrote:
> Here's a way to quickly (I hope! Haven't done any benchmarks) tell if
> one of a bunch of words is contained in a chunk of text, assuming the
> words are known beforehand [...]
Is there any reason to suppose that this is more efficient than using
re.compile("|".join(re.escape(words))).match? I haven't looked at the
implementation of sre, but it should be able to generate a simple DFA
for this RE and execute something very much like your 'match' function,
but at C speeds.
Having one dict lookup per character scanned seems like a pretty huge
chunk of overhead.
Jeff
More information about the Python-list
mailing list