[Chicago] Address parser?

Christopher Allan Webber cwebber at imagescape.com
Mon Feb 25 19:14:33 CET 2008


Massimo Di Pierro <mdipierro at cs.depaul.edu> writes:

> You may want to look into this
>
>     http://exogen.case.edu/projects/geopy/
>
> I also have my own which I use for
>
>     http://www.appealmypropertytaxes.com
>
> Mine performs normalization based on the USPS specifications. They  
> have a very long document that say you should use AVE and not AVENUE  
> or AV., you should use N and not NORTH, etc. My parser works in most  
> cases and it is specifically  designed to translate addresses into  
> web2py database queries. It is not freely available but I can make it  
> available to web2py users if there is a need.
>
> Massimo

That sounds pretty neat, and not just for web2py users.  I would love
to see this released as a standalone module.

> On Feb 25, 2008, at 11:02 AM, Phil Robare wrote:
>
>> Address parsing is a hard problem.  Not in the theoretical NP sense,
>> but in that it requires a lot of knowledge of special cases.
>> Addresses can be ambiguous or not depending upon information that the
>> application 'just has to know'.  For instance an address in Chicago of
>> 320 Randolph is ambiguous - It could be east or west.  But an address
>> of 1320 Randolph is merely incomplete, needing West as part of the
>> street name. If the user dropped the space you could figure out where
>> 1320 westrandolph street was.  But a Westmont Street would just be a
>> street named after a suburb.  It would probably be the same as an
>> address on Westmont Ave.  But Atlanta is (in)famous for having
>> multiple different roads all named Peachtree but having different
>> suffixes, e.g. Road, Avenue, Boulevard.  Usually digits are part of
>> the address and words are part of the street name.  Detroit, for
>> example, confounds things with "8 Mile Road". In many places a street
>> has multiple names, bearing both the local name and the highway route
>> name, so you get an address like 185 Rt 45.  There are people with the
>> last name of "Street" that have had a road named after them.  While
>> the block number might be useful for figuring out west or east in
>> Chicago, in the suburbs it can be a mess.  Arlington Heights Road goes
>> through a number of suburbs, many of them having their own numbering
>> system and their own east/west dividing point.  These addresses can be
>> ambiguous because no one knows which suburb they are in as they drive
>> along it. Most addresses are whole numbers but within the US there are
>> a number of places that use fractions (like 1/2) to specify part of a
>> duplex, and there are even places that use decimals in the address in
>> place of apartment numbers.  Another problem is that there are
>> multiple towns with the same name in some states, so the county has to
>> be part of the address (or the zip code has to be checked).
>>
>> So, as far as I know, there are no good public domain address parsers
>> because of the amount of work it takes to create one and the
>> dependence of the parsing upon an underlying map.  If you are a direct
>> marketer mailing hundreds of pieces the post office parser may be a
>> good choice.  But if you are working for a web retailer who would just
>> like to make sure the user typed an address that can be mailed to I
>> think the Google API would be an option (depending upon terms of use -
>> I don't know how restricted they are with regards to businesses using
>> it.)  Navteq and Teleatlas have commercial offerings that I am not
>> very familiar with.
>>
>> Asking the person entering the data to put in a house number field, a
>> street name, a street type, direction suffix/prefix, etc. can make the
>> job of the coder easier but will frustrate those who have to enter an
>> address that doesn't fit the model.
>>
>> Phil
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago


More information about the Chicago mailing list