Scanning a file
Bengt Richter
bokr at oz.net
Sat Oct 29 03:48:39 EDT 2005
On Fri, 28 Oct 2005 20:03:17 -0700, aleaxit at yahoo.com (Alex Martelli) wrote:
>Mike Meyer <mwm at mired.org> wrote:
> ...
>> Except if you can't read the file into memory because it's to large,
>> there's a pretty good chance you won't be able to mmap it either. To
>> deal with huge files, the only option is to read the file in in
>> chunks, count the occurences in each chunk, and then do some fiddling
>> to deal with the pattern landing on a boundary.
>
>That's the kind of things generators are for...:
>
>def byblocks(f, blocksize, overlap):
> block = f.read(blocksize)
> yield block
> while block:
> block = block[-overlap:] + f.read(blocksize-overlap)
> if block: yield block
>
>Now, to look for a substring of length N in an open binary file f:
>
>f = open(whatever, 'b')
>count = 0
>for block in byblocks(f, 1024*1024, len(subst)-1):
> count += block.count(subst)
>f.close()
>
>not much "fiddling" needed, as you can see, and what little "fiddling"
>is needed is entirely encompassed by the generator...
>
Do I get a job at google if I find something wrong with the above? ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list