Fuzzy string comparison

jmw jwickard at gmail.com
Thu Dec 28 10:17:52 EST 2006


Gabriel Genellina wrote:
> At Tuesday 26/12/2006 18:08, John Machin wrote:

> > > > I'm looking for a module to do fuzzy comparison of strings. [...]

> Other alternatives: trigram, n-gram, Jaro's distance. There are some
> Python implem. available.

Quick question, you mentioned the data you need to run comparisons on
is stored in a database.  Is this string comparison a one-time
processing kind of thing to clean up the data, or are you going to have
to continually do fuzzy string comparison on the data in the database?
There are some papers out there on implementing n-gram string
comparisons completely in SQL so that you don't have to pull back all
the data in your tables in order to do fuzzy comparisons.  I can drum
up some code I did a while ago and post it (in java).




More information about the Python-list mailing list