[Chicago] Address parser?
Christopher Allan Webber
cwebber at imagescape.com
Mon Feb 25 19:14:33 CET 2008
Massimo Di Pierro <mdipierro at cs.depaul.edu> writes:
> You may want to look into this
>
> http://exogen.case.edu/projects/geopy/
>
> I also have my own which I use for
>
> http://www.appealmypropertytaxes.com
>
> Mine performs normalization based on the USPS specifications. They
> have a very long document that say you should use AVE and not AVENUE
> or AV., you should use N and not NORTH, etc. My parser works in most
> cases and it is specifically designed to translate addresses into
> web2py database queries. It is not freely available but I can make it
> available to web2py users if there is a need.
>
> Massimo
That sounds pretty neat, and not just for web2py users. I would love
to see this released as a standalone module.
> On Feb 25, 2008, at 11:02 AM, Phil Robare wrote:
>
>> Address parsing is a hard problem. Not in the theoretical NP sense,
>> but in that it requires a lot of knowledge of special cases.
>> Addresses can be ambiguous or not depending upon information that the
>> application 'just has to know'. For instance an address in Chicago of
>> 320 Randolph is ambiguous - It could be east or west. But an address
>> of 1320 Randolph is merely incomplete, needing West as part of the
>> street name. If the user dropped the space you could figure out where
>> 1320 westrandolph street was. But a Westmont Street would just be a
>> street named after a suburb. It would probably be the same as an
>> address on Westmont Ave. But Atlanta is (in)famous for having
>> multiple different roads all named Peachtree but having different
>> suffixes, e.g. Road, Avenue, Boulevard. Usually digits are part of
>> the address and words are part of the street name. Detroit, for
>> example, confounds things with "8 Mile Road". In many places a street
>> has multiple names, bearing both the local name and the highway route
>> name, so you get an address like 185 Rt 45. There are people with the
>> last name of "Street" that have had a road named after them. While
>> the block number might be useful for figuring out west or east in
>> Chicago, in the suburbs it can be a mess. Arlington Heights Road goes
>> through a number of suburbs, many of them having their own numbering
>> system and their own east/west dividing point. These addresses can be
>> ambiguous because no one knows which suburb they are in as they drive
>> along it. Most addresses are whole numbers but within the US there are
>> a number of places that use fractions (like 1/2) to specify part of a
>> duplex, and there are even places that use decimals in the address in
>> place of apartment numbers. Another problem is that there are
>> multiple towns with the same name in some states, so the county has to
>> be part of the address (or the zip code has to be checked).
>>
>> So, as far as I know, there are no good public domain address parsers
>> because of the amount of work it takes to create one and the
>> dependence of the parsing upon an underlying map. If you are a direct
>> marketer mailing hundreds of pieces the post office parser may be a
>> good choice. But if you are working for a web retailer who would just
>> like to make sure the user typed an address that can be mailed to I
>> think the Google API would be an option (depending upon terms of use -
>> I don't know how restricted they are with regards to businesses using
>> it.) Navteq and Teleatlas have commercial offerings that I am not
>> very familiar with.
>>
>> Asking the person entering the data to put in a house number field, a
>> street name, a street type, direction suffix/prefix, etc. can make the
>> job of the coder easier but will frustrate those who have to enter an
>> address that doesn't fit the model.
>>
>> Phil
>> _______________________________________________
>> Chicago mailing list
>> Chicago at python.org
>> http://mail.python.org/mailman/listinfo/chicago
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
More information about the Chicago
mailing list