newbie question: parsing street name from address

Paul McGuire ptmcg at austin.rr.com
Thu Jun 21 12:55:05 EDT 2007


On Jun 21, 8:47 am, cjl <cjl... at gmail.com> wrote:
> P:
>
> I am working on a project that requires geocoding, and have written a
> very simple geocoder that uses the Google service.
>
> I would like to be able to extract the name of the street from the
> addresses in my data, however they vary significantly. Here a some
> examples:
>
> 25 Main St
> 2500 14th St
> 12 Bennet Pkwy
> Pearl St
> Bennet Rd and Main st
> 19th St
>
> As you can see, sometimes I have the house number, and sometimes I do
> not. Sometimes the street name is a number. Sometimes I simply have
> the names of intersecting streets.
>
> I would like to be able to parse the above into the following:
>
> Main St
> 14th St
> Bennet Pkwy
> Pearl St
> Bennet Rd
> Main St
> 19th St
>
> How might I approach this complex parsing problem?
>
> -CJL

Parsing street addresses is a very complex parsing problem.  Please
look at this example (http://pyparsing.wikispaces.com/space/showimage/
streetAddressParser.py) from the pyparsing wiki, which includes
support for these test cases:

    100 South Street
    123 Main
    221B Baker Street
    10 Downing St
    1600 Pennsylvania Ave
    33 1/2 W 42nd St.
    454 N 38 1/2
    21A Deer Run Drive
    256K Memory Lane
    12-1/2 Lincoln
    23N W Loop South
    23 N W Loop South

I took your list and added them to the test cases, which broke a few
lines in the grammar.  The current online version now includes support
for your new formats as well.  Here is some sample output from the
pyparsing example:

100 South Street
['100', 'South', 'Street']
- name: South
- number: 100
- street: ['100', 'South', 'Street']
  - name: South
  - number: 100
  - type: Street
- type: Street
Street is South

221B Baker Street
['221B', 'Baker', 'Street']
- name: Baker
- number: 221B
- street: ['221B', 'Baker', 'Street']
  - name: Baker
  - number: 221B
  - type: Street
- type: Street
Street is Baker Street

10 Downing St
['10', 'Downing', 'St']
- name: Downing
- number: 10
- street: ['10', 'Downing', 'St']
  - name: Downing
  - number: 10
  - type: St
- type: St
Street is Downing St

1600 Pennsylvania Ave
['1600', 'Pennsylvania', 'Ave']
- name: Pennsylvania
- number: 1600
- street: ['1600', 'Pennsylvania', 'Ave']
  - name: Pennsylvania
  - number: 1600
  - type: Ave
- type: Ave
Street is Pennsylvania Ave

33 1/2 W 42nd St.
['33 1/2', 'W 42 nd', 'St']
- name: W 42 nd
- number: 33 1/2
- street: ['33 1/2', 'W 42 nd', 'St']
  - name: W 42 nd
  - number: 33 1/2
  - type: St
- type: St
Street is W 42 nd St

454 N 38 1/2
['454', 'N 38 1/2']
- name: N 38 1/2
- number: 454
- street: ['454', 'N 38 1/2']
  - name: N 38 1/2
  - number: 454
Street is N 38 1/2

25 Main St
['25', 'Main', 'St']
- name: Main
- number: 25
- street: ['25', 'Main', 'St']
  - name: Main
  - number: 25
  - type: St
- type: St
Street is Main St

2500 14th St
['2500', '14 th', 'St']
- name: 14 th
- number: 2500
- street: ['2500', '14 th', 'St']
  - name: 14 th
  - number: 2500
  - type: St
- type: St
Street is 14 th St

12 Bennet Pkwy
['12', 'Bennet', 'Pkwy']
- name: Bennet
- number: 12
- street: ['12', 'Bennet', 'Pkwy']
  - name: Bennet
  - number: 12
  - type: Pkwy
- type: Pkwy
Street is Bennet Pkwy

Pearl St
['Pearl', 'St']
- name: Pearl
- street: ['Pearl', 'St']
  - name: Pearl
  - type: St
- type: St
Street is Pearl St

Bennet Rd and Main St
['Bennet', 'Rd', 'and', 'Main', 'St']
- crossStreet: ['Bennet', 'Rd']
  - name: Bennet
  - type: Rd
- name: Main
- street: ['Main', 'St']
  - name: Main
  - type: St
- type: St
Street is Main St

19th St
['19 th', 'St']
- name: 19 th
- street: ['19 th', 'St']
  - name: 19 th
  - type: St
- type: St
Street is 19 th St


-- Paul




More information about the Python-list mailing list