Scanning a file
Scott David Daniels
scott.daniels at acm.org
Sat Oct 29 17:13:15 EDT 2005
Paul Watson wrote:
> Here is a better one that counts, and not just detects, the substring. This
> is -much- faster than using mmap; especially for a large file that may cause
> paging to start. Using mmap can be -very- slow.
>
> <ss = pattern, be = len(ss) - 1>
> ...
> b = fp.read(blocksize)
> count = 0
> while len(b) > be:
> count += b.count(ss)
> b = b[-be:] + fp.read(blocksize)
> ...
In cases where that one wins and blocksize is big,
this should do even better:
...
block = fp.read(blocksize)
count = 0
while len(block) > be:
count += block.count(ss)
lead = block[-be :]
block = fp.read(blocksize)
count += (lead + block[: be]).count(ss)
...
--
-Scott David Daniels
scott.daniels at acm.org
More information about the Python-list
mailing list