Is there a faster way to do this?

Boris Borcic bborcic at gmail.com
Wed Aug 6 10:13:16 EDT 2008


Is your product ID always the 3rd and last item on the line ?
Else your output won't separate IDs.

And how does

output = open(output_file,'w')
for x in set(line.split(',')[2] for line in open(input_file)) :
     output.write(x)
output.close()

behave ?


ronald.johnson at gmail.com wrote:
> I have a csv file containing product information that is 700+ MB in
> size. I'm trying to go through and pull out unique product ID's only
> as there are a lot of multiples. My problem is that I am appending the
> ProductID to an array and then searching through that array each time
> to see if I've seen the product ID before. So each search takes longer
> and longer. I let the script run for 2 hours before killing it and had
> only run through less than 1/10 if the file.
> 
> Heres the code:
> import string
> 
> def checkForProduct(product_id, product_list):
>     for product in product_list:
>         if product == product_id:
>             return 1
>     return 0
> 
> 
> input_file="c:\\input.txt"
> output_file="c:\\output.txt"
> product_info = []
> input_count = 0
> 
> input = open(input_file,"r")
> output = open(output_file, "w")
> 
> for line in input:
>     break_down = line.split(",")
>     product_number = break_down[2]
>     input_count+=1
>     if input_count == 1:
>         product_info.append(product_number)
>         output.write(line)
>         output_count = 1
>     if not checkForProduct(product_number,product_info):
>         product_info.append(product_number)
>         output.write(line)
>         output_count+=1
> 
> output.close()
> input.close()
> print input_count
> print output_count
> --
> http://mail.python.org/mailman/listinfo/python-list
> 




More information about the Python-list mailing list