Separate Address number and name

Anders Wegge Keller wegge at wegge.dk
Tue Jan 21 20:04:49 EST 2014


Shane Konings <shane.konings at gmail.com> writes:

...

> The following is a sample of the data. There are hundreds of lines
> that need to have an automated process of splitting the strings into
> headings to be imported into excel with theses headings

> ID  Address  StreetNum  StreetName  SufType  Dir   City  Province  PostalCode
> 
> 
> 1	1067 Niagara Stone Rd, W, Niagara-On-The-Lake, ON L0S 1J0
> 2	4260 Mountainview Rd, Lincoln, ON L0R 1B2
> 3	25 Hunter Rd, Grimsby, E, ON L3M 4A3
> 4	1091 Hutchinson Rd, Haldimand, ON N0A 1K0
> 5	5172 Green Lane Rd, Lincoln, ON L0R 1B3
> 6	500 Glenridge Ave, East, St. Catharines, ON L2S 3A1
> 7	471 Foss Rd, Pelham, ON L0S 1C0
> 8	758 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 9	3836 Main St, North, Lincoln, ON L0R 1S0
> 10	1025 York Rd, W, Niagara-On-The-Lake, ON L0S 1P0

 The input doesn't look consistent to me. Is Dir supposed to be an
optional value? If that is the only optional, it can be worked
around. But if the missing direction (I'm guessing) is due to
malformed input data, you have a hell of a job in front of you.

 What do you want to do with incomplete or malformed data? Try to
parse it as a "best effort", or simply spew out an error message for
an operator to look at?

 In the latter case, I suggest a stepwise approach:

* Split input by ',' ->res0

* Split the first result by ' ' -> res

-> Id = res[0]
-> Address = res[1:]
-> StreetNum = res[1]
-> StreetName= res [2:]
-> SufType = res[-1]

* Check if res0[1] looks like a cardinal direction
 If so Dir = res0[1]
 Otherwise, croak or use the default direction. Insert an element in
 the list, so the remainder is shifted to match the following steps.

-> City = res0[2]

* Split res0[3] by ' ' -> respp

respp[0] -> Province
respp[1:] -> Postcode


 And put in som basic sanitation of the resulting values, before
committing them as a parsed result. Provinces and post codes, should
be easy enough to validate against a fixed list. 

-- 
/Wegge

Leder efter redundant peering af dk.*,linux.debian.*



More information about the Python-list mailing list