[Tutor] Finding a specific line in a body of text

Mon Mar 12 09:49:58 CET 2012

On 12/03/12 03:28, Steven D'Aprano wrote:

> Another approach may be to read the whole file into memory in one big
> chunk. 1.1 million lines, by (say) 50 characters per line comes to about
> 53 MB per file, which should be small enough to read into memory and
> process it in one chunk. Something like this:
>
> # again untested
> text = open('filename').read()
> results = []
> i = 0
> while i<  len(text):
>      offset = text.find(key, i)
>      if i == -1: break
>      i += len(key)  # skip the rest of the key
>      # read ahead to the next newline, twice
>      i = text.find('\n', i)
>      i = text.find('\n', i)
>      # now find the following newline, and save everything up to that
>      p = text.find('\n', i)
>      if p == -1:  p = len(text)
>      results.append(text[i:p])
>      i = p  # skip ahead

Or using readlines:

index = 0
text = open('filename').readlines()
while True:
   try:
     index = text.index(key,index) + 2
     results.append(text[index])
   except ValueError: break

readlines will take slightly more memory.

But I suspect a tool like grep will be faster. grep
can be downloaded for windows.

To use grep explore the -A option.

Even using grep as a pre-filter to pipe into your
program might work.

But you may also have to accept that processing 450
large files will take some time! You can help by
parallel processing up to the number of cores (less 1)
in your PC, But other than that you may just need a
faster computer! Either more RAM or a SSD drive will
help greatly.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/