Fuzzy matching of postal addresses
Aaron Bingham
bingham at cenix-bioscience.com
Tue Jan 18 03:09:09 EST 2005
Andrew McLean wrote:
> I have a problem that is suspect isn't unusual and I'm looking to see if
> there is any code available to help. I've Googled without success.
>
> Basically, I have two databases containing lists of postal addresses and
> need to look for matching addresses in the two databases. More
> precisely, for each address in database A I want to find a single
> matching address in database B.
I had a similar problem to solve a while ago. I can't give you my code,
but I used this paper as the basis for my solution (BibTeX entry from
http://citeseer.ist.psu.edu/monge00adaptive.html):
@misc{ monge-adaptive,
author = "Alvaro E. Monge",
title = "An Adaptive and Efficient Algorithm for Detecting
Approximately Duplicate
Database Records",
url = "citeseer.ist.psu.edu/monge00adaptive.html" }
There is a lot of literature--try a google search for "approximate
string match"--but very little publically available code in this area,
from what I could gather. Removing punctuation, etc., as others have
suggested in this thread, is _not_sufficient_. Presumably you want to
be able to match typos or phonetic errors as well. This paper's
algorithm deals with those problems quite nicely,
--
--------------------------------------------------------------------
Aaron Bingham
Application Developer
Cenix BioScience GmbH
--------------------------------------------------------------------
More information about the Python-list
mailing list