difflib qualm
Sick Monkey
sickcodemonkey at gmail.com
Wed Jan 24 21:05:24 EST 2007
I am trying to write a python script that will compare 2 files which
contains names (millions of them).
More specifically, I have 2 files (Files1.txt and Files2.txt).
Files1.txtcontains 180 thousand names and
Files2.txt contains 34 million names.
I have a script which will analyze these two files and store them into 2
different lists (fileList1 and fileList2 respectivly). I have imported the
diflib library and after the lists are created, matching on the following
criteria " " for diflib -> (just the names that are similar between the two
files).
This works perfectly for hundreds of names but is taking forever for
millions of them; thus not really efficient.
Does anyone have any idea on how to get this more efficient? (speaking of
Time and RAM)
Any advice would be greatly appreciated. (NOTE: I have been trying to
study multithreading, but have not really grasp the concept. So I may need
some examples.)
~~~~~~~~~~~~~~
S.C.M.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070124/ecb7fbb8/attachment.html>
More information about the Python-list
mailing list