find and replace string in binary file

Dave Angel davea at davea.name
Wed Mar 5 07:06:11 EST 2014


 loial <jldunn2000 at gmail.com> Wrote in message:
> How do I read a binary file, find/identify a character string and replace it with another character string and write out to another file?
> 
> Its the finding of the string in a binary file that I am not clear on.
> 
> Any help appreciated
> 

I see from another message that you're using Python 2.6. That
 makes a huge difference and should have been in your query, along
 with a minimal code sample.

Is the binary file under 100 MB or so? Then open it (in binary
 mode 'rb'), and read it. You'll now have a (large) byte string
 containing the entire file. 

The next question is whether you're sure that your search and
 replace strings are ASCII. Assuming that is probably a mistake, 
 but it will get you started. 

Now the substitution is trivial:
        new_bytes = old_bytes.replace (search, replace)
It's also possible to emulate that with find and slice, mainly if
 you need to report progress to the user.

If the search and/or replace strings are not ASCII, you have to
 know what encoding the file may have used for them.  You need to
 build a Unicode string, encode it the same way as the file uses,
 and then call the replace method. 

Now for a huge caveat.  If you don't know the binary format,
 you're risking the creation of pure junk. Here are just two
 examples of what might go wrong, assuming the file is an
 executable.  The same risks exist for other files, but I'm just
 supposing. 

If the two byte strings are not the same length, then all the
 remaining code and data in the file will be moved to a new spot. 
 If you're lucky,  the code will crash quickly,  since all
 pointers referencing that code and data are incorrect.

If some non-textual part of the file happens to match your search
 string you're going to likely trash that portion of the code.  If
 the search string is large enough,  maybe this is unlikely.  But
 I recall taking the challenge of writing assembly programs which
 could be generated entirely from one or more type commands
 (msdos)



-- 
DaveA




More information about the Python-list mailing list