Pythonic search of list of dictionaries

Bulba! bulba at bulba.com
Tue Jan 4 10:58:06 EST 2005


Hello everyone,

I'm reading the rows from a CSV file. csv.DictReader puts
those rows into dictionaries.

The actual files contain old and new translations of software
strings. The dictionary containing the row data looks like this:

    o={'TermID':'4', 'English':'System Administration',
'Polish':'Zarzadzanie systemem'}

I put those dictionaries into the list:

   oldl=[x for x in orig]  # where orig=csv.DictReader(ofile ...

..and then search for matching source terms in two loops:

   for o in oldl:
       for n in newl:
           if n['English'] == o['English']:
           ...

Now, this works. However, not only this is very un-Pythonic, but also
very inefficient: the complexity is O(n**2), so it scales up very
badly.

What I want to know is if there is some elegant and efficient
way of doing this, i.e. finding all the dictionaries dx_1 ... dx_n,
contained in a list (or a dictionary) dy, where dx_i  contains
a specific value. Or possibly just the first dx_1 dictionary.

I HAVE to search for values corresponding to key 'English', since
there are big gaps in both files (i.e. there's a lot of rows 
in the old file that do not correspond to the rows in the new
file and vice versa). I don't want to do ugly things like converting
dictionary to a string so I could use string.find() method. 

Obviously it does not have to be implemented this way. If
data structures here could be designed in a proper 
(Pythonesque ;-) way, great. 

I do realize that this resembles doing some operation on 
matrixes.  But I have never tried doing smth like this in 
Python.


#---------- Code follows ---------

import sys
import csv

class excelpoldialect(csv.Dialect):
    delimiter=';'
    doublequote=True
    lineterminator='\r\n'
    quotechar='"'
    quoting=0
    skipinitialspace=False

epdialect=excelpoldialect()
csv.register_dialect('excelpol',epdialect)


try:
    ofile=open(sys.argv[1],'rb')
except IOError:
    print "Old file %s could not be opened" % (sys.argv[1])
    sys.exit(1)

try:
    tfile=open(sys.argv[2],'rb')
except IOError:
    print "New file %s could not be opened" % (sys.argv[2])
    sys.exit(1)

    
titles=csv.reader(ofile, dialect='excelpol').next()
orig=csv.DictReader(ofile, titles, dialect='excelpol')
transl=csv.DictReader(tfile, titles, dialect='excelpol')

cfile=open('cmpfile.csv','wb')
titles.append('New')
titles.append('RowChanged')
cm=csv.DictWriter(cfile,titles, dialect='excelpol')
cm.writerow(dict(zip(titles,titles)))


print titles
print "-------------"

oldl=[x for x in orig]
newl=[x for x in transl]

all=[]

for o in oldl:
    for n in newl:
        if n['English'] == o['English']:
            if n['Polish'] == o['Polish']:
                status=''
            else:
                status='CHANGED'
            combined={'TermID': o['TermID'], 'English': o['English'],
'Polish': o['Polish'], 'New': n['Polish'], 'RowChanged': status}
            cm.writerow(combined)
            all.append(combined)

            
# duplicates

dfile=open('dupes.csv','wb')
dupes=csv.DictWriter(dfile,titles,dialect='excelpol')
dupes.writerow(dict(zip(titles,titles)))

"""for i in xrange(0,len(all)-2):
    for j in xrange(i+1, len(all)-1):
        if (all[i]['English']==all[j]['English']) and
all[i]['RowChanged']=='CHANGED':
            dupes.writerow(all[i])
            dupes.writerow(all[j])"""
 
cfile.close()
ofile.close()
tfile.close()
dfile.close()










--

Real world is perfectly indifferent to lies that 
are the foundation of leftist "thinking".



More information about the Python-list mailing list