Scanning a file
Paul Watson
pwatson at redlinepy.com
Fri Oct 28 11:59:24 EDT 2005
<pinkfloydhomer at gmail.com> wrote in message
news:1130497567.764104.125110 at g44g2000cwa.googlegroups.com...
>I want to scan a file byte for byte for occurences of the the four byte
> pattern 0x00000100. I've tried with this:
>
> # start
> import sys
>
> numChars = 0
> startCode = 0
> count = 0
>
> inputFile = sys.stdin
>
> while True:
> ch = inputFile.read(1)
> numChars += 1
>
> if len(ch) < 1: break
>
> startCode = ((startCode << 8) & 0xffffffff) | (ord(ch))
> if numChars < 4: continue
>
> if startCode == 0x00000100:
> count = count + 1
>
> print count
> # end
>
> But it is very slow. What is the fastest way to do this? Using some
> native call? Using a buffer? Using whatever?
>
> /David
How about something like:
#!/usr/bin/env python
import sys
fn = 't.dat'
ss = '\x00\x00\x01\x00'
be = len(ss) - 1 # length of overlap to check
blocksize = 4 * 1024 # need to ensure that blocksize > overlap
fp = open(fn, 'rb')
b = fp.read(blocksize)
found = 0
while len(b) > be:
if b.find(ss) != -1:
found = 1
break
b = b[-be:] + fp.read(blocksize)
fp.close()
sys.exit(found)
More information about the Python-list
mailing list