Scanning a file

Sat Oct 29 18:36:47 EDT 2005

"Paul Watson" <pwatson at redlinepy.com> writes:

> "Mike Meyer" <mwm at mired.org> wrote in message 
> news:864q70evci.fsf at bhuda.mired.org...
>> "Paul Watson" <pwatson at redlinepy.com> writes:
> ...
>> Did you do timings on it vs. mmap? Having to copy the data multiple
>> times to deal with the overlap - thanks to strings being immutable -
>> would seem to be a lose, and makes me wonder how it could be faster
>> than mmap in general.
>
> The only thing copied is a string one byte less than the search string for 
> each block.

Um - you removed the code, but I could have *sworn* that it did
something like:

          buf = buf[testlen:] + f.read(bufsize - testlen)

which should cause the the creation of three strings: the last few
bytes of the old buffer, a new bufferfull from the read, then the sum
of those two - created by copying the first two into a new string. So
you wind up copying all the data.

Which, as you showed, doesn't take nearly as much time as using mmap.

       Thanks,
       <mike
-- 
Mike Meyer <mwm at mired.org>			http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.