[Tutor] Matching zipcode in address file

bob gailer bgailer at gmail.com
Mon Apr 5 14:19:56 CEST 2010


On 4/5/2010 1:15 AM, TGW wrote:
>> Sorry - my mistake - try:
>>
>> infile = open("filex")
>> match_zips = open("zippys")
>> result = [line for line in infile if line in match_zips]
>> print result
> When I apply the readlines to the original file, It is taking a lot 
> longer to process and the outfile still remains blank. Any suggestions?

OK - you handled the problem regarding reading to end-of-file. Yes it 
takes a lot longer, because now you are actually iterating through 
match_zips for each line.

How large are these files? Consider creating a set from match_zips. As 
lists get longer, set membership test become faster than list membership 
test.

If the outfile is empty that means that line[149:154] is never in 
match_zips.

I suggest you take a look at match_zips. You will find a list of strings 
of length 6, which cannot match line[149:154], a string of length 5.

>
> #!/usr/bin/env python
> # Find records that match zipcodes in zips.txt
>
> import os
> import sys
>
> def main():
>     infile = open("/Users/tgw/NM_2010/NM_APR.txt", "r")
>     outfile = open("zip_match_apr_2010.txt", "w")
>     zips = open("zips.txt", "r")
>     match_zips = zips.readlines()
>     lines = [ line for line in infile if line[149:154] in match_zips ]
>
>     outfile.write(''.join(lines))
> #    print line[149:154]
>     print lines
>     infile.close()
>     outfile.close()
> main()
>
>
>


-- 
Bob Gailer
919-636-4239
Chapel Hill NC



More information about the Tutor mailing list