List performance and CSV

Stephan usenet.filter at gmail.com
Sat Oct 8 11:19:43 EDT 2005


Hello,

I'm working on a simple project in Python that reads in two csv files
and compares items in one file with items in another for matches.  I
read the files in using the csv module, adding each line into a list.
Then I run the comparision on the lists.  This works fine, but I'm
curious about performance.

Here's the main part of my code:

######
file1 = open("CustomerList.csv")
CustomerList = csv.reader(file1)
Customers = []

#Read in the contents of the CSV file into memory
for CustomerRecord in CustomerList:
    Customers.append(CustomerRecord)

#not shown here: the second file CustomersToMatch
#is loaded in a similar manner

#loop through each record and find matches on column 2
#breaking out of inner loop when a match is found
for loop1 in range(len(CustomersToMatch)):
    for loop2 in range(len(Customers)):
        if (CustomersToMatch[loop1][2] == Customers[loop2][2]) :
            CustomersToMatch[loop1][1] = Customers[loop2][1]
            break

######

With this code, it takes roughly 10 minutes on a 2Ghz x86 box to
compare two lists of 20,000 records.  Is that good?  Out of curiousity,
I tried psyco and saw no difference.  Is there a better Python synax to
use?

Thanks,
-Stephan




More information about the Python-list mailing list