Fuzzy matching of postal addresses

John Machin sjmachin at lexicon.net
Tue Jan 18 16:47:39 EST 2005


John Machin wrote:
> Ermmm ... only remove "the" when you are sure it is a whole word.
Even
> then it's a dodgy idea. In the first 1000 lines of the nearest
address
> file I had to hand, I found these: Catherine, Matthew, Rotherwood,
> Weatherall, and "The Avenue".
>

Partial apologies: I wasn't reading Skip's snippet correctly -- he had
"THE ", I read "THE". Only "The Avenue" is a problem in the above list.
However Skip's snippet _does_ do damage in cases where the word ends in
"the". Grepping lists of placenames found 25 distinct names in UK,
including "The Mythe" and "The Wrythe".

Addendum: Given examples in the UK like "Barton in the Beans" (no
kiddin') and "Barton-on-the-Heath", replacing "-" by space seems
indicated.




More information about the Python-list mailing list