How to check if any item from a list of strings is in a big string?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Tue Jul 14 01:06:04 EDT 2009


En Mon, 13 Jul 2009 10:11:09 -0300, denis <denis-bz-gg at t-online.de>  
escribió:

> Matt, how many words are you looking for, in how long a string ?
> Were you able to time any( substr in long_string ) against re.compile
> ( "|".join( list_items )) ?

There is a known algorithm to solve specifically this problem  
(Aho-Corasick), a good implementation should perform better than R.E. (and  
better than the gen.expr. with the advantage of returning WHICH string  
matched)
There is a C extension somewhere implementing Aho-Corasick.

-- 
Gabriel Genellina




More information about the Python-list mailing list