Dictionary/Hash question
Sick Monkey
sickcodemonkey at gmail.com
Tue Feb 6 18:31:17 EST 2007
Even though I am starting to get the hang of Python, I continue to find
myself finding problems that I cannot solve.
I have never used dictionaries before and I feel that they really help
improve efficiency when trying to analyze huge amounts of data (rather than
having nested loops).
Basically what I have is 2 different files containing data. My program will
take the first line in one file and see if it exists in another file. If it
does find a match, then it will write the data to a file.
---------------
Right now, the code will open file1 and store all contents in a list. Then
it will do the same thing to file2. THEEEEN it will loop over list1 and
insert into a Hash table. I am trying to find out a way to make this code
more efficient. SO here is what i would rather have..... when i open file1
send directly to the hash table totally bypassing the insertion of the
script...... Is this possible?
def fcompare(f1name, f2name):
import re
mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
f1 = fopen(f1name)
f2 = fopen(f2name)
if not f1 or not f2:
return 0
a = f1.readlines(); f1.close()
b = f2.readlines(); f2.close()
file1List= []
print "starting list 1"
for c in a:
file1List.extend(mailsrch.findall(c))
print "storing File1 in dictionary."
d1 = {}
for item in file1List :
d1[item] = None
print "finished storing information in lists."
print "starting list 2"
file2List = []
for d in b:
file2List.extend(mailsrch.findall(d))
utp = open("match.txt","w")
for item in file2List :
if d1.has_key( item ) :
utp.write(item + '\n')
utp.close()
#del file1List
#del file2List
print "finished comparing 2 lists."
#return 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070206/b7528763/attachment.html>
More information about the Python-list
mailing list