Scanning a file

Fri Oct 28 19:09:18 EDT 2005

<pinkfloydhomer at gmail.com> wrote in message 
news:1130497567.764104.125110 at g44g2000cwa.googlegroups.com...
>I want to scan a file byte for byte for occurences of the the four byte
> pattern 0x00000100. I've tried with this:
>
> # start
> import sys
>
> numChars = 0
> startCode = 0
> count = 0
>
> inputFile = sys.stdin
>
> while True:
>    ch = inputFile.read(1)
>    numChars += 1
>
>    if len(ch) < 1: break
>
>    startCode = ((startCode << 8) & 0xffffffff) | (ord(ch))
>    if numChars < 4: continue
>
>    if startCode == 0x00000100:
>        count = count + 1
>
> print count
> # end
>
> But it is very slow. What is the fastest way to do this? Using some
> native call? Using a buffer? Using whatever?
>
> /David

Here is an attempt at counting and using the mmap facility.  There appear to 
be some serious backward compatibility issues.  I tried Python 2.1 on 
Windows and AIX and had some odd results.  If you are 2.4.1 or higher that 
should not be a problem.

#!/usr/bin/env python
import sys
import os
import mmap

fn = 't.dat'
ss = '\x00\x00\x01\x00'

fp = open(fn, 'rb')
b = mmap.mmap(fp.fileno(), os.stat(fp.name).st_size, 
access=mmap.ACCESS_READ)

count = 0
foundpoint = b.find(ss, 0)
while foundpoint != -1 and (foundpoint + 1) < b.size():
    count = count + 1
    foundpoint = b.find(ss, foundpoint + 1)
b.close()

print count

fp.close()
sys.exit(0)