Regular Expressions: large amount of or's
Daniel Yoo
dyoo at hkn.eecs.berkeley.edu
Sun Mar 13 03:15:07 EST 2005
: Otherwise, you may want to look at a specialized data structure for
: doing mutiple keyword matching; I had an older module that wrapped
: around a suffix tree:
: http://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/
: It looks like other folks, thankfully, have written other
: implementations of suffix trees:
: http://cs.haifa.ac.il/~shlomo/suffix_tree/
: Another approach is something called the Aho-Corasick algorithm:
: http://portal.acm.org/citation.cfm?doid=360825.360855
: though I haven't been able to find a nice Python module for this yet.
Followup on this: I haven't been able to find one, so I took someone
else's implementation and adapted it. *grin*
Here you go:
http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/
This provides an 'ahocorasick' Python C extension module for doing
matching on a set of keywords. I'll start writing out the package
announcements tomorrow.
I hope this helps!
More information about the Python-list
mailing list