Dictionary/Hash question

Sick Monkey sickcodemonkey at gmail.com
Tue Feb 6 18:31:17 EST 2007


Even though I am starting to get the hang of Python, I continue to find
myself finding problems that I cannot solve.
I have never used dictionaries before and I feel that they really help
improve efficiency when trying to analyze huge amounts of data (rather than
having nested loops).

Basically what I have is 2 different files containing data.  My program will
take the first line in one file and see if it exists in another file.  If it
does find a match, then it will write the data to a file.
---------------
Right now, the code will open file1 and store all contents in a list.  Then
it will do the same thing to file2.  THEEEEN it will loop over list1 and
insert into a Hash table.   I am trying to find out a way to make this code
more efficient.  SO here is what i would rather have.....  when i open file1
send directly to the hash table totally bypassing the insertion of the
script......  Is this possible?

def fcompare(f1name, f2name):
    import re
    mailsrch = re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
    f1 = fopen(f1name)
    f2 = fopen(f2name)
    if not f1 or not f2:
        return 0
    a = f1.readlines(); f1.close()
    b = f2.readlines(); f2.close()
    file1List= []
    print "starting list 1"
    for c in a:
       file1List.extend(mailsrch.findall(c))
    print "storing File1 in dictionary."

    d1 = {}
    for item in file1List :
       d1[item] = None
  print "finished storing information in lists."

   print "starting list 2"
   file2List = []
   for d in b:
      file2List.extend(mailsrch.findall(d))

    utp = open("match.txt","w")
    for item in file2List :
       if d1.has_key( item ) :
          utp.write(item +  '\n')

    utp.close()
    #del file1List
    #del file2List
    print "finished comparing 2 lists."
    #return 1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070206/b7528763/attachment.html>


More information about the Python-list mailing list