Counting elements in a list wildcard

Mon Apr 24 19:56:47 EDT 2006

> Behalf Of hawkesed
> If I have a list, say of names. And I want to count all the people
> named, say, Susie, but I don't care exactly how they spell it (ie,
> Susy, Susi, Susie all work.) how would I do this? Set up a regular
> expression inside the count? Is there a wildcard variable I can use?
> Here is the code for the non-fuzzy way:
> lstNames.count("Susie")
> Any ideas? Is this something you wouldn't expect count to do?
> Thanks y'all from a newbie.

If there are specific spellings you want to allow, you could just create a
list of them and see if your Suzy is in there:

>>> possible_suzys = [ 'Susy', 'Susi', 'Susie' ]
>>> my_strings = ['Bob', 'Sally', 'Susi', 'Dick', 'Jane' ]
>>> for line in my_strings:
... 	if line in possible_suzys: print line
... 	
Susi

I think a general solution to this problem is to use edit (also called
Levenshtein) distance. There is an implementation in Python at this Wiki:
http://en.wikisource.org/wiki/Levenshtein_distance

You could use this distance function, and normalize for string length using
the following score function:

def score( a, b ):
    "Calculates the similarity score of the two strings based on edit
distance."
    high_len = max( len(a), len(b) )
    return float( high_len - distance( a, b ) ) / float( high_len )

>>> for line in my_strings:
... 	if score( line, 'Susie' ) > .75: print line
... 	
Susi

--
Regards,
Ryan Ginstrom