Counting elements in a list wildcard

Dave Hughes dave at waveform.plus.com
Mon Apr 24 22:59:45 EDT 2006


hawkesed wrote:

> If I have a list, say of names. And I want to count all the people
> named, say, Susie, but I don't care exactly how they spell it (ie,
> Susy, Susi, Susie all work.) how would I do this? Set up a regular
> expression inside the count? Is there a wildcard variable I can use?
> Here is the code for the non-fuzzy way:
> lstNames.count("Susie")
> Any ideas? Is this something you wouldn't expect count to do?
> Thanks y'all from a newbie.
> Ed

You might want to check out the SoundEx and MetaPhone algorithms which
provide approximations of the "sound" of a word based on spelling
(assuming English pronunciations).

Apparently a soundex module used to be built into Python but was
removed in 2.0. You can find several implementations on the 'net, for
example:

http://orca.mojam.com/~skip/python/soundex.py
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52213

MetaPhone is generally considered better than SoundEx for "sounds-like"
matching, although it's considerably more complex (IIRC, although it's
been a long time since I wrote an implementation of either in any
language). A Python MetaPhone implementations (there must be more than
this one?):

http://joelspeters.com/awesomecode/

Another algorithm that might interest isn't based on "sounds-like" but
instead computes the number of transforms necessary to get from one
word to another: the Levenshtein distance. A C based implementation
(with Python interface) is available:

http://trific.ath.cx/resources/python/levenshtein/

Whichever algorithm you go with, you'll wind up with some sort of
"similar" function which could be applied in a similar manner to Ben's
example (I've just mocked up the following -- it's not an actual
session):

    >>> import soundex
    >>> import metaphone
    >>> import levenshtein
    >>> my_strings = ['Bob', 'Sally', 'Susi', 'Dick', 'Susy', 'Jane' ]
    >>> found_suzys = [s for s in my_strings if
soundsex.sounds_similar(s, 'Susy')]
    >>> found_suzys = [s for s in my_strings if
metaphone.sounds_similar(s, 'Susy')]
    >>> found_suzys = [s for s in my_strings if levenshtein.distance(s,
'Susy') < 4]
    >>> found_suzys
    ['Susi', 'Susy'] (one hopes anyway!)


HTH,

Dave.
-- 




More information about the Python-list mailing list