String matching based on sound?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Jan 29 18:04:34 EST 2018


On Mon, 29 Jan 2018 13:28:32 -0900, Israel Brewster wrote:

> In initial searching, I did find the "fuzzy" library, which at first
> glance appeared to be what I was looking for, but it, apparently,
> ignores numbers, with the result that "all 4 one" gave the same output
> as "all in", but NOT the same output as "all 4 1" - even though "all 4
> 1" sounds EXACTLY the same, while "all in" is only similar if you ignore
> the 4.

Before passing the string to the fuzzy matcher, do a simple text 
replacement of numbers to their spelled out version: "4" -> "four".

You may want to do other text replacements too, based on sound or visual 
design, for example to deal with Kei$ha a.k.a. Keisha, etc.


-- 
Steve




More information about the Python-list mailing list