regex-strategy for finding *similar* words?

Daniel Dittmar daniel.dittmar at sap.corp
Thu Nov 18 08:32:53 EST 2004


Christoph Pingel wrote:
> an interesting problem for regex nerds.
> I've got a thesaurus of some hundred words and a moderately large 
> dataset of about 1 million words in some thousand small texts. Words 
> from the thesaurus appear at many places in my texts, but they are often 
> misspelled, just slightly different from the thesaurus.

There exists the agrep project (http://www.tgries.de/agrep/), for which 
Python bindings exist. agrep (=approximate grep) allows you to specify 
the number of allowed errors.

Daniel



More information about the Python-list mailing list