Scanning a file
Paul Watson
pwatson at redlinepy.com
Sun Oct 30 20:53:17 EST 2005
Fredrik Lundh wrote:
> Paul Watson wrote:
>
>>This is Cyngwin on Windows XP.
>
> using cygwin to analyze performance characteristics of portable API:s
> is a really lousy idea.
Ok. So, I agree. That is just what I had at hand. Here are some other
numbers to which due diligence has also not been applied. Source code
is at the bottom for both file and mmap process. I would be willing for
someone to tell me what I could improve.
$ python -V
Python 2.4.1
$ uname -a
Linux ruth 2.6.13-1.1532_FC4 #1 Thu Oct 20 01:30:08 EDT 2005 i686
$ cat /proc/meminfo|head -2
MemTotal: 514232 kB
MemFree: 47080 kB
$ time ./scanfile.py
16384
real 0m0.06s
user 0m0.03s
sys 0m0.01s
$ time ./scanfilemmap.py
16384
real 0m0.10s
user 0m0.06s
sys 0m0.00s
Using a ~ 250 MB file, not even half of physical memory.
$ time ./scanfile.py
16777216
real 0m11.19s
user 0m10.98s
sys 0m0.17s
$ time ./scanfilemmap.py
16777216
real 0m55.09s
user 0m43.12s
sys 0m11.92s
==============================
$ cat scanfile.py
#!/usr/bin/env python
import sys
fn = 't.dat'
ss = '\x00\x00\x01\x00'
ss = 'time'
be = len(ss) - 1 # length of overlap to check
blocksize = 64 * 1024 # need to ensure that blocksize > overlap
fp = open(fn, 'rb')
b = fp.read(blocksize)
count = 0
while len(b) > be:
count += b.count(ss)
b = b[-be:] + fp.read(blocksize)
fp.close()
print count
sys.exit(0)
===================================
$ cat scanfilemmap.py
#!/usr/bin/env python
import sys
import os
import mmap
fn = 't.dat'
ss = '\x00\x00\x01\x00'
ss='time'
fp = open(fn, 'rb')
b = mmap.mmap(fp.fileno(), os.stat(fp.name).st_size,
access=mmap.ACCESS_READ)
count = 0
foundpoint = b.find(ss, 0)
while foundpoint != -1 and (foundpoint + 1) < b.size():
#print foundpoint
count = count + 1
foundpoint = b.find(ss, foundpoint + 1)
b.close()
print count
fp.close()
sys.exit(0)
More information about the Python-list
mailing list