locating strings approximately

John Machin sjmachin at lexicon.net
Wed Jun 28 20:00:36 EDT 2006


On 29/06/2006 9:28 AM, BBands wrote:
> I'd like to see if a string exists, even approximately, in another. For
> example if "black" exists in "blakbird" or if "beatles" exists in
> "beatlemania". The application is to look though a long list of songs
> and return any approximate matches along with a confidence factor. I
> have looked at edit distance, but that isn't a good choice for finding
> a short string in a longer one.

There is a trivial difference between the traditional 
distance-matrix-based Levenshtein algorithm for edit distance and the 
corresponding one for approximate string searching. Ditto between 
finite-state-machine approaches. Ditto between modern bit-bashing 
approaches.

> I have also explored
> difflib.SequenceMatcher and .get_close_matches, but what I'd really
> like is something like:
> 
> a = FindApprox("beatles", "beatlemania")
> print a
> 0.857
> 
> Any ideas?

You got no ideas from googling "approximate string search python"???





More information about the Python-list mailing list