Scanning a file

Sat Oct 29 17:13:15 EDT 2005

Paul Watson wrote:

> Here is a better one that counts, and not just detects, the substring.  This 
> is -much- faster than using mmap; especially for a large file that may cause 
> paging to start.  Using mmap can be -very- slow.
> 
 > <ss = pattern, be = len(ss) - 1>
> ...
> b = fp.read(blocksize)
> count = 0
> while len(b) > be:
>     count += b.count(ss)
>     b = b[-be:] + fp.read(blocksize)
> ...
In cases where that one wins and blocksize is big,
this should do even better:
     ...
     block = fp.read(blocksize)
     count = 0
     while len(block) > be:
         count += block.count(ss)
         lead = block[-be :]
         block = fp.read(blocksize)
         count += (lead + block[: be]).count(ss)
     ...
-- 
-Scott David Daniels
scott.daniels at acm.org