Counting elements in a list wildcard

Edward Elliott nobody at 127.0.0.1
Tue Apr 25 01:15:31 EDT 2006


Dave Hughes wrote:
> Another algorithm that might interest isn't based on "sounds-like" but
> instead computes the number of transforms necessary to get from one
> word to another: the Levenshtein distance. A C based implementation
> (with Python interface) is available:

I don't know what algorithm it uses, but the difflib module looks similar. 
I've had good results using the get_close_matches function to locate
similarly-named mp3 files.

However I don't think "close enough" is well suited for this application. 
The sequences are short and non-distinct.  Difference matching needs longer
sequences to be effective.  Phoneme matching seems overly complex and might
grab things like Tsu-zi.  I'd just use a list of alternate spellings like
Ben suggested.





More information about the Python-list mailing list