Scanning a file
Alex Martelli
aleaxit at yahoo.com
Fri Oct 28 23:03:17 EDT 2005
Mike Meyer <mwm at mired.org> wrote:
...
> Except if you can't read the file into memory because it's to large,
> there's a pretty good chance you won't be able to mmap it either. To
> deal with huge files, the only option is to read the file in in
> chunks, count the occurences in each chunk, and then do some fiddling
> to deal with the pattern landing on a boundary.
That's the kind of things generators are for...:
def byblocks(f, blocksize, overlap):
block = f.read(blocksize)
yield block
while block:
block = block[-overlap:] + f.read(blocksize-overlap)
if block: yield block
Now, to look for a substring of length N in an open binary file f:
f = open(whatever, 'b')
count = 0
for block in byblocks(f, 1024*1024, len(subst)-1):
count += block.count(subst)
f.close()
not much "fiddling" needed, as you can see, and what little "fiddling"
is needed is entirely encompassed by the generator...
Alex
More information about the Python-list
mailing list