locating strings approximately

John Machin sjmachin at lexicon.net
Wed Jun 28 20:52:40 EDT 2006


On 29/06/2006 10:07 AM, BBands wrote:
> On 6/28/06, John Machin <sjmachin at lexicon.net> wrote:
>> On 29/06/2006 9:28 AM, BBands wrote:
>> > I'd like to see if a string exists, even approximately, in another. For
>> > example if "black" exists in "blakbird" or if "beatles" exists in
>> > "beatlemania". The application is to look though a long list of songs
>> > and return any approximate matches along with a confidence factor. I
>> > have looked at edit distance, but that isn't a good choice for finding
>> > a short string in a longer one.
>>
>> There is a trivial difference between the traditional
>> distance-matrix-based Levenshtein algorithm for edit distance and the
>> corresponding one for approximate string searching. Ditto between
>> finite-state-machine approaches. Ditto between modern bit-bashing
>> approaches.
>>
>> > I have also explored
>> > difflib.SequenceMatcher and .get_close_matches, but what I'd really
>> > like is something like:
>> >
>> > a = FindApprox("beatles", "beatlemania")
>> > print a
>> > 0.857
>> >
>> > Any ideas?
>>
>> You got no ideas from googling "approximate string search python"???
> 
> Yes, many including agrepy and soundex in addition to those I
> mentioned already, but none seem really handy at approximately looking
> up smaller strings in larger ones. I also note that this has been the
> topic of prior discussion without resolutiuon.
> 
>    jab

It helps if you tell all that you've done. Otherwise people will tell 
you to do what you've done already :-)

It helps if you reply on-list so that others can see. You may get better 
answers sooner. I have to vanish now. Will check back tonight.

Cheers,
John

agrepy = approximate-grep-python -- why doesn't that suit you?



More information about the Python-list mailing list