Convert raw binary file to ascii

Mon Jul 27 16:07:16 EDT 2009

r2 wrote:
> On Jul 27, 9:06 am, Peter Otten <__pete... at web.de> wrote:
>   
>> r2 wrote:
>>     
>>> I have a memory dump from a machine I am trying to analyze. I can view
>>> the file in a hex editor to see text strings in the binary code. I
>>> don't see a way to save these ascii representations of the binary, so
>>> I went digging into Python to see if there were any modules to help.
>>>       
>>> I found one I think might do what I want it to do - the binascii
>>> module. Can anyone describe to me how to convert a raw binary file to
>>> an ascii file using this module. I've tried? Boy, I've tried.
>>>       
>> That won't work because a text editor doesn't need any help to convert the
>> bytes into characters. If it expects ascii it just will be puzzled by bytes
>> that are not valid ascii. Also, it will happily display byte sequences that
>> are valid ascii, but that you as a user will see as gibberish because they
>> were meant to be binary data by the program that wrote them.
>>
>>     
>>> Am I correct in assuming I can get the converted binary to ascii text
>>> I see in a hex editor using this module? I'm new to this forensics
>>> thing and it's quite possible I am mixing technical terms. I am not
>>> new to Python, however. Thanks for your help.
>>>       
>> Unix has the "strings" commandline tool to extract text from a binary.
>> Get hold of a copy of the MinGW tools if you are on windows.
>>
>> Peter
>>     
>
> Okay. Thanks for the guidance. I have a machine with Linux, so I
> should be able to do what you describe above. Could Python extract the
> strings from the binary as well? Just wondering.
>
>   
Yes, you could do the same thing in Python easily enough.  And with the 
advantage that you could define your own meanings for "characters."

The memory dump could be storing characters that are strictly ASCII.  Or 
it could have EBCDIC, or UTF-8.  And it could be Unicode, 16 bit or 32 
bits, and big-endian or little-endian.  Or the characters could be in 
some other format specific to a particular program.

However, it's probably very useful to see what a "strings" program might 
look like, because you can quickly code variations on it, to suit your 
particular data.
Something like the following (totally untested)

def isprintable(char):
    return 0x20 <= char <= 0x7f

def string(filename):
    data = open(filename, "rb").read()
    count = 0
    line = ""
    for ch in data:
        if isprintable(ch):
             count += 1
             line = line + ch
        else:
             if count > 4 :     #cutoff, don't print strings smaller 
than this because they're probably just coincidence
                 print line
                 count = 0
                 line= ""
    print line

Now you can change the definition of what's "printable", you can change 
the min-length that you care about.  And of course you can fine-tune 
things like max-length lines and such.

DaveA