Scanning a file
Peter Otten
__peter__ at web.de
Sat Oct 29 04:34:24 EDT 2005
Bengt Richter wrote:
> On Fri, 28 Oct 2005 20:03:17 -0700, aleaxit at yahoo.com (Alex Martelli)
> wrote:
>
>>Mike Meyer <mwm at mired.org> wrote:
>> ...
>>> Except if you can't read the file into memory because it's to large,
>>> there's a pretty good chance you won't be able to mmap it either. To
>>> deal with huge files, the only option is to read the file in in
>>> chunks, count the occurences in each chunk, and then do some fiddling
>>> to deal with the pattern landing on a boundary.
>>
>>That's the kind of things generators are for...:
>>
>>def byblocks(f, blocksize, overlap):
>> block = f.read(blocksize)
>> yield block
>> while block:
>> block = block[-overlap:] + f.read(blocksize-overlap)
>> if block: yield block
>>
>>Now, to look for a substring of length N in an open binary file f:
>>
>>f = open(whatever, 'b')
>>count = 0
>>for block in byblocks(f, 1024*1024, len(subst)-1):
>> count += block.count(subst)
>>f.close()
>>
>>not much "fiddling" needed, as you can see, and what little "fiddling"
>>is needed is entirely encompassed by the generator...
>>
> Do I get a job at google if I find something wrong with the above? ;-)
Try it with a subst of length 1. Seems like you missed an opportunity :-)
Peter
More information about the Python-list
mailing list