Street address parsing in Python, again.

John Nagle nagle at animats.com
Fri Jun 4 15:59:21 EDT 2010


John Nagle wrote:

>    The parser at PyParsing:
> 
>      http://pyparsing.wikispaces.com/file/view/streetAddressParser.py
> 
> ..Bad cases...
> 487 E. Middlefield Rd.  -> streetnumber = 487, streetname = E. MIDDLEFIELD
> 487 East Middlefield Road -> streetnumber = 487, streetname = EAST MIDDLEFIELD
> 226 West Wayne Street -> streetnumber = 226, streetname = WEST WAYNE
> New Orchard Road -> streetnumber = , streetname = NEW
> 1 New Orchard Road -> streetnumber = 1 , streetname = NEW
> 390 Park Avenue -> streetnumber =, streetname = 390


   Here's a system that gets all the above cases right: the USC Deterministic
Address Parser.

https://webgis.usc.edu/Services/AddressNormalization/Interactive/DeterministicNormalization.aspx

This will parse a street address line alone, without a city, state, or ZIP code,
so it's not using a big database.  There's a technical paper

http://gislab.usc.edu/i/publications/gislabtr11.pdf

but it doesn't have that much detail.  However, now we know a solution
exists.  I've asked USC if they'll make the code available.

					John Nagle



More information about the Python-list mailing list