[Tutor] Find (list) strings in large textfile

Danny Yoo dyoo at hashcollision.org
Thu Feb 9 20:08:18 EST 2017


Files don't rewind automatically, so once a loop goes through the file
once, subsequent attempts will finish immediately.

We might fix this by "seek", which will let us rewind files.

However, your data is large enough that you might want to consider
efficiency too.  The nested loop approach is going to be expensive, taking
time proportional to the product of the sizes of your input files.

The problem can be done more efficiently, iterating over each file exactly
once.  Here is a sketch:

Try storing the numbers you are looking to find, and keep it in a set.
That is the loop over the first file.

Then, loop over the second file, consulting the set to see if the line is a
candidate or not.  Set membership is expected to be an inexpensive
operation.


More information about the Tutor mailing list