difflib qualm

Sick Monkey sickcodemonkey at gmail.com
Wed Jan 24 21:05:24 EST 2007


 I am trying to write a python script that will compare 2 files which
contains names (millions of them).

More specifically, I have 2 files (Files1.txt and Files2.txt).
Files1.txtcontains 180 thousand names and
Files2.txt contains 34 million names.

I have a script which will analyze these two files and store them into 2
different lists (fileList1 and fileList2 respectivly).  I have imported the
diflib library and after the lists are created, matching on the following
criteria " " for diflib -> (just the names that are similar between the two
files).

This works perfectly for hundreds of names but is taking forever for
millions of them; thus not really efficient.

Does anyone have any idea on how to get this more efficient?  (speaking of
Time and RAM)

Any advice would be greatly appreciated.   (NOTE:  I have been trying to
study multithreading, but have not really grasp the concept.  So I may need
some examples.)

~~~~~~~~~~~~~~
S.C.M.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070124/ecb7fbb8/attachment.html>


More information about the Python-list mailing list