Regular Expressions: large amount of or's

Daniel Yoo dyoo at hkn.eecs.berkeley.edu
Sun Mar 13 03:15:07 EST 2005


: Otherwise, you may want to look at a specialized data structure for
: doing mutiple keyword matching; I had an older module that wrapped
: around a suffix tree:

:    http://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/

: It looks like other folks, thankfully, have written other
: implementations of suffix trees:

:    http://cs.haifa.ac.il/~shlomo/suffix_tree/

: Another approach is something called the Aho-Corasick algorithm:

:    http://portal.acm.org/citation.cfm?doid=360825.360855

: though I haven't been able to find a nice Python module for this yet.


Followup on this: I haven't been able to find one, so I took someone
else's implementation and adapted it.  *grin*

Here you go:

    http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

This provides an 'ahocorasick' Python C extension module for doing
matching on a set of keywords.  I'll start writing out the package
announcements tomorrow.


I hope this helps!



More information about the Python-list mailing list