Scanning a file
Tony Nelson
*firstname*nlsnews at georgea*lastname*.com
Sun Oct 30 20:37:18 EST 2005
In article <1130637600.659212.66140 at g43g2000cwa.googlegroups.com>,
netvaibhav at gmail.com wrote:
> Steve Holden wrote:
> > Indeed, but reading one byte at a time is about the slowest way to
> > process a file, in Python or any other language, because it fails to
> > amortize the overhead cost of function calls over many characters.
> >
> > Buffering wasn't invented because early programmers had nothing better
> > to occupy their minds, remember :-)
>
> Buffer, and then read one byte at a time from the buffer.
Have you mesured it?
#!/usr/bin/python
'''Time some file scanning.
'''
import sys, time
f = open(sys.argv[1])
t = time.time()
while True:
b = f.read(256*1024)
if not b:
break
print 'initial read', time.time() - t
f.close()
f = open(sys.argv[1])
t = time.time()
while True:
b = f.read(256*1024)
if not b:
break
print 'second read', time.time() - t
f.close()
if 1:
f = open(sys.argv[1])
t = time.time()
while True:
b = f.read(256*1024)
if not b:
break
for c in b:
pass
print 'third chars', time.time() - t
f.close()
f = open(sys.argv[1])
t = time.time()
n = 0
srch = '\x00\x00\x01\x00'
laplen = len(srch)-1
lap = ''
while True:
b = f.read(256*1024)
if not b:
break
n += (lap+b[:laplen]).count(srch)
n += b.count(srch)
lap = b[-laplen:]
print 'fourth scan', time.time() - t, n
f.close()
On my (old) system, with a 512 MB file so it won't all buffer, the
second time I get:
initial read 14.513395071
second read 14.8771388531
third chars 178.250257969
fourth scan 26.1602909565 1
________________________________________________________________________
TonyN.:' *firstname*nlsnews at georgea*lastname*.com
' <http://www.georgeanelson.com/>
More information about the Python-list
mailing list