Unwanted Spaces and Iterative Loop

matt.s.marotta at gmail.com matt.s.marotta at gmail.com
Sun Jan 26 20:15:20 EST 2014


On Sunday, 26 January 2014 19:40:26 UTC-5, Steven D'Aprano  wrote:
> On Sun, 26 Jan 2014 13:46:21 -0800, matt.s.marotta wrote:
> 
> 
> 
> > I have been working on a python script that separates mailing addresses
> 
> > into different components.
> 
> > 
> 
> > Here is my code:
> 
> > 
> 
> > inFile = "directory"
> 
> > outFile = "directory"
> 
> > inHandler = open(inFile, 'r')
> 
> > outHandler = open(outFile, 'w')
> 
> 
> 
> Are you *really* opening the same file for reading and writing at the 
> 
> same time?
> 
> 
> 
> Even if your operating system allows that, surely it's not a good idea. 
> 
> You might get away with it for small files, but at some point you're 
> 
> going to run into weird, hard-to-diagnose bugs.
> 
> 
> 
> 
> 
> > outHandler.write("FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir
> 
> \tCity\tProvince\tPostalCode")
> 
> 
> 
> This looks like a CSV file using tabs as the separator. You really ought 
> 
> to use the csv module.
> 
> 
> 
> http://docs.python.org/3/library/csv.html
> 
> http://docs.python.org/2/library/csv.html
> 
> 
> 
> http://pymotw.com/2/csv/
> 
> 
> 
> 
> 
> > for line in inHandler:
> 
> >     str = line.replace("FarmID\tAddress", " ")
> 
> >     outHandler.write(str[0:-1])
> 
> >     str = str.replace(" ","\t", 1)
> 
> >     str = str.replace(" Rd,","\tRd\t\t")
> 
> >     str = str.replace(" Rd","\tRd\t")
> 
> >     str = str.replace("Ave,","\tAve\t\t") 
> 
> >     str = str.replace("Ave","\tAve\t\t")
> 
> >     str = str.replace("St ","\tSt\t\t")
> 
> >     str = str.replace("St,","\tSt\t\t")
> 
> >     str = str.replace("Dr,","\tDr\t\t")
> 
>       [snip additional string manipulations]
> 
> >     str = str.replace(",","\t")
> 
> >     str = str.replace(" ON","ON\t")
> 
> >     outHandler.write(str)
> 
> 
> 
> 
> 
> Aiy aiy aiy, what a mess! I get a headache just trying to understand it!
> 
> 
> 
> The first question that comes to mind is that you appear to be writing 
> 
> each input line *twice*, first after a very minimal set of string 
> 
> manipulations (you convert the literal string "FarmID\tAddress" to a 
> 
> space, then write the whole line out), the second time after a whole mess 
> 
> of string replacements. Why?
> 
> 
> 
> If the sample data you show below is accurate, I *think* what you are 
> 
> trying to do is simply suppress the header line. The first line in the 
> 
> input file is:
> 
> 
> 
> FarmID	Address
> 
> 
> 
> and rather than write that you want to write a space. I don't know why 
> 
> you want the output file to begin with a space, but this would be better:
> 
> 
> 
> for line in inHandler:
> 
>     line = line.strip()  # Remove any leading and trailing whitespace,
> 
>         # including the trailing newline. Later, we'll add a newline 
> 
>         # back in.
> 
>     if line == "FarmID\tAddress":
> 
>         outHandler.write(" ")  # Write a mysterious space.
> 
>         continue  # And skip to the next line.
> 
>     # Now process the non-header lines.
> 
> 
> 
> 
> 
> Now, as far as the non-header lines, you do a whole lot of complex string 
> 
> manipulations, replacing chunks of text with or without tabs or commas to 
> 
> the same text with or without tabs but in a different order. The logic of 
> 
> these manipulations completely escape me: what are you actually trying to 
> 
> do here?
> 
> 
> 
> I *strongly* suggest that you don't try to implement your program logic 
> 
> in the form of string manipulations. According to your sample data, your 
> 
> data looks like this:
> 
> 
> 
> 1	1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
> 
> 
> 
> i.e. 
> 
> 
> 
> farmId TAB address COMMA district COMMA postcode
> 
> 
> 
> It is much better to pull the line apart into named components, 
> 
> manipulate the components directly, then put it back together in the 
> 
> order you want. This makes the code more understandable, and easier to 
> 
> change if you ever need to change things.
> 
> 
> 
> for line in inHandler:
> 
>     line = line.strip()
> 
>     if line == "FarmID\tAddress":
> 
>         outHandler.write(" ")  # Write a mysterious space.
> 
>         continue
> 
>     # Now process the non-header lines.
> 
>     farmid, address = line.split("\t")
> 
>     farmid = farmid.strip()
> 
>     address, district, postcode = address.split(",")
> 
>     address = address.strip()
> 
>     district = district.strip()
> 
>     postcode = postcode.strip()
> 
>     # Now process the fields however you like.
> 
>     parts_of_address = address.split(" ")
> 
>     street_number = parts_of_address[0]  # first part
> 
>     street_type = parts_of_address[-1]  # last part
> 
>     street_name = parts_of_address[1:-1]  # everything else
> 
>     street_name = " ".join(street_name)
> 
> 
> 
> and so on for the post code. Then, at the very end, assemble the parts 
> 
> you want to write out, join them with tabs, and write:
> 
> 
> 
>     fields = [farmid, street_number, street_name, street_type, ... ]
> 
>     outHandler.write("\t".join(fields))
> 
>     outHandler.write("\n")
> 
> 
> 
> 
> 
> Or use the csv module to do the actual writing. It will handle escaping 
> 
> anything that needs escaping, newlines, tabs, etc.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

I`m not reading and writing to the same file, I just changed the actual paths to directory.

This is for a school assignment, and we haven`t been taught any of the stuff you`re talking about.  Although I appreciate your help, everything needs to stay as is and I just need to create the loop to get rid of the farmID from the end of the postal codes.



More information about the Python-list mailing list