Scanning a file
Paul Watson
pwatson at redlinepy.com
Fri Oct 28 19:09:18 EDT 2005
<pinkfloydhomer at gmail.com> wrote in message
news:1130497567.764104.125110 at g44g2000cwa.googlegroups.com...
>I want to scan a file byte for byte for occurences of the the four byte
> pattern 0x00000100. I've tried with this:
>
> # start
> import sys
>
> numChars = 0
> startCode = 0
> count = 0
>
> inputFile = sys.stdin
>
> while True:
> ch = inputFile.read(1)
> numChars += 1
>
> if len(ch) < 1: break
>
> startCode = ((startCode << 8) & 0xffffffff) | (ord(ch))
> if numChars < 4: continue
>
> if startCode == 0x00000100:
> count = count + 1
>
> print count
> # end
>
> But it is very slow. What is the fastest way to do this? Using some
> native call? Using a buffer? Using whatever?
>
> /David
Here is an attempt at counting and using the mmap facility. There appear to
be some serious backward compatibility issues. I tried Python 2.1 on
Windows and AIX and had some odd results. If you are 2.4.1 or higher that
should not be a problem.
#!/usr/bin/env python
import sys
import os
import mmap
fn = 't.dat'
ss = '\x00\x00\x01\x00'
fp = open(fn, 'rb')
b = mmap.mmap(fp.fileno(), os.stat(fp.name).st_size,
access=mmap.ACCESS_READ)
count = 0
foundpoint = b.find(ss, 0)
while foundpoint != -1 and (foundpoint + 1) < b.size():
count = count + 1
foundpoint = b.find(ss, foundpoint + 1)
b.close()
print count
fp.close()
sys.exit(0)
More information about the Python-list
mailing list