Fuzzy matching of postal addresses

Aaron Bingham bingham at cenix-bioscience.com
Tue Jan 18 03:09:09 EST 2005


Andrew McLean wrote:
> I have a problem that is suspect isn't unusual and I'm looking to see if 
> there is any code available to help. I've Googled without success.
> 
> Basically, I have two databases containing lists of postal addresses and 
> need to look for matching addresses in the two databases. More 
> precisely, for each address in database A I want to find a single 
> matching address in database B.

I had a similar problem to solve a while ago.  I can't give you my code, 
but I used this paper as the basis for my solution (BibTeX entry from 
http://citeseer.ist.psu.edu/monge00adaptive.html):

@misc{ monge-adaptive,
   author = "Alvaro E. Monge",
   title = "An Adaptive and Efficient Algorithm for Detecting 
Approximately Duplicate
     Database Records",
   url = "citeseer.ist.psu.edu/monge00adaptive.html" }

There is a lot of literature--try a google search for "approximate 
string match"--but very little publically available code in this area, 
from what I could gather.  Removing punctuation, etc., as others have 
suggested in this thread, is _not_sufficient_.  Presumably you want to 
be able to match typos or phonetic errors as well.  This paper's 
algorithm deals with those problems quite nicely,

-- 
--------------------------------------------------------------------
Aaron Bingham
Application Developer
Cenix BioScience GmbH
--------------------------------------------------------------------




More information about the Python-list mailing list