finding/replacing a long binary pattern in a .bin file

Fri Jan 14 22:55:51 EST 2005

On 14 Jan 2005 15:40:27 -0800, "yaipa" <yaipa at yahoo.com> wrote:

>Bengt, and all,
>
>Thanks for all the good input.   The problems seems to be that .find()
>is good for text files on Windows, but is not much use when it is
>binary data.  The script is for a Assy Language build tool, so I know
Did you try it? Why shouldn't find work for binary data?? At the end of
this, I showed an example of opening and modding a text file _in binary_.

 >>> s= ''.join(chr(i) for i in xrange(256))
 >>> s
 '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\
 x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`ab
 cdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f
 \x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7
 \xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf
 \xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7
 \xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef
 \xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
 >>> for i in xrange(256):
 ...     assert i == s.find(chr(i))
 ...
 >>>

I.e., all the finds succeded for all 256 possible bytes. Why wouldn't you think that would work fine
for data from a binary file? Of course, find is case sensitive and fixed, not a regex, so it's
not very flexible. It wouldn't be that hard to expand to a list of old,new pairs as a change spec
though. Of course that would slow it down some.

>the exact seek address  of the binary data that I need to replace, so
>maybe I'll just go that way.  It just seemed a little more general to
>do a search and replace rather than having to type in a seek address.
Except you run the risk of not having a unique search result, unless you
have a really guaranteed unique pattern.
>
>Of course I could use a Lib function to convert the binary data to
>ascii and back, but seems a little over the top in this case.
I think you misunderstand Python strings. There is no need to "convert" the result
of open(filename, 'rb').read(chunksize). Re-read the example below ;-)
[...]
>>
>> If you wanted to change a binary file, you'd use it something like
                             ^^^^^^^^^^^
>(although probably let
>> the default buffer size be at 4096, not 20, which is pretty silly
>other than demoing.
>> At least the input chunks are 512 ;-)
>>
>>  >>> from sreplace import sreplace
>>  >>> fw = open('sreplace.py.txt','wb')
        opens a binary output file

>>  >>> for buf in sreplace(iter(lambda
>f=open('sreplace.py','rb'):f.read(512), ''),'out','OUT',20):
        iter(f, sentinel) is the format above. I creates an iterator that
        keeps calling f() until f()==sentinel, which it doesn't return, and that ends the sequence
        f in this case is lambda f=open(inputfilename):f.read(inputchunksize) 
        and the sentinel is '' -- which is what is returned at EOF.
        The old thing to find was 'out', to be changed to 'OUT', and the 20 was a silly small
        return chunks size for the sreplace(...) iterator. Alll these chunks were simply passed
        to
>>  ...     fw.write(buf)
>>  ...
>>  >>> fw.close()
        and closing the file explicitly wrapped it up.
>>  >>> ^Z

I just typed that in interactively to demo the file change process with the source itself,  so the diff
could show the changes. I guess I should have made sreplace.py runnable as a binary file updater, rather
than a cute demo using command line text. The files are no worry, but what is the source of your old
and new binary patterns that you want use for find and replace? You can't enter them in unescaped format
on a command line, so you may want to specify them in separate binary files, or you could specify them
as Python strings in a module that could be imported. E.g.,

---< old2new.py >------
# example of various ways to specify binary bytes in strings
from binascii import unhexlify as hex2chr
old = (
'This is plain text.'
+ ''.join(map(chr,[33,44,55, 0xaa])) + '<<-- arbitrary list of binary bytes specified in numerically if desired'
+ chr(33)+chr(44)+chr(55)+ '<<-- though this is plainer for a short sequence'
+ hex2chr('4142433031320001ff') + r'<<-- should be ABC012\x00\x01\xff'
)

new = '\x00'*len(old) # replace with zero bytes
-----------------------

BTW: Note: changing binaries can be dangerous! Do so at your own risk!!
And this has not been tested worth a darn, so caveat**n.

---< binfupd.py >------
from sreplace import sreplace
def main(infnam, outfnam, old, new):
    infile = open(infnam, 'rb')
    inseq = iter(lambda: infile.read(4096), '')
    outfile = open(outfnam, 'wb')
    try:
        try:
            for buf in sreplace(inseq, old, new):
                outfile.write(buf)
        finally:
            infile.close()
            outfile.close()
    except Exception, e:
        print '%s:%s' %(e.__class__.__name__, e)

if __name__ == '__main__':
    import sys
    try:
        oldnew = __import__(sys.argv[3])
        main(sys.argv[1], sys.argv[2], oldnew.old, oldnew.new)
    except Exception, e:
        print '%s:%s' %(e.__class__.__name__, e)
        raise SystemExit, """
    Usage: [python] binfupd.py infname outfname oldnewmodulename
        where infname is read in binary, and outfname is written
        in binary, replacing instances of old binary data with new
        specified as python strings named old and new respectively
        in a module named oldnewmodulename (without .py extension).
    """
-----------------------

REMEMBER: NO WARRANTY FOR ANY PURPOSE! USE AT YOUR OWN RISK!

And, if you know where to seek to, that seems like the best way ;-)

Regards,
Bengt Richter