Dictionary/Hash question

Gabriel Genellina gagsl-py at yahoo.com.ar
Tue Feb 6 19:49:11 EST 2007


En Tue, 06 Feb 2007 20:31:17 -0300, Sick Monkey <sickcodemonkey at gmail.com>  
escribió:

> Even though I am starting to get the hang of Python, I continue to find
> myself finding problems that I cannot solve.
> I have never used dictionaries before and I feel that they really help
> improve efficiency when trying to analyze huge amounts of data (rather  
> than
> having nested loops).

You are right, a list is not the right data structure in your case.
But a dictionary is a mapping from keys to values, and you have no values  
to store.
In this case one should use a set: like a list, but without ordering, and  
no duplicated elements.
Also, it's not necesary to read all lines at once, you can process both  
files line by line. And since reading both files appears to be the same  
thing, you can make a function:

def mailsfromfile(fname):
   result = set()
   with open(fname,'r') as finput:
     for line in finput:
       mails = some_regular_expression.findall(line)
       if mails:
         result.update(mails)
   return result

mails1 = mailsfromfile(f1name)
mails2 = mailsfromfile(f2name)

for mail in mails1 & mails2: # & = set intersection, mails present on both  
files
   # write mail to output file

-- 
Gabriel Genellina




More information about the Python-list mailing list