Convert raw binary file to ascii

r2 rlichlighter at gmail.com
Mon Jul 27 15:33:21 EDT 2009


On Jul 27, 2:07 pm, Peter Otten <__pete... at web.de> wrote:
> r2 wrote:
> > On Jul 27, 9:06 am, Peter Otten <__pete... at web.de> wrote:
> >> r2 wrote:
> >> > I have a memory dump from a machine I am trying to analyze. I can view
> >> > the file in a hex editor to see text strings in the binary code. I
> >> > don't see a way to save these ascii representations of the binary, so
> >> > I went digging into Python to see if there were any modules to help.
>
> >> > I found one I think might do what I want it to do - the binascii
> >> > module. Can anyone describe to me how to convert a raw binary file to
> >> > an ascii file using this module. I've tried? Boy, I've tried.
>
> >> That won't work because a text editor doesn't need any help to convert
> >> the bytes into characters. If it expects ascii it just will be puzzled by
> >> bytes that are not valid ascii. Also, it will happily display byte
> >> sequences that are valid ascii, but that you as a user will see as
> >> gibberish because they were meant to be binary data by the program that
> >> wrote them.
>
> >> > Am I correct in assuming I can get the converted binary to ascii text
> >> > I see in a hex editor using this module? I'm new to this forensics
> >> > thing and it's quite possible I am mixing technical terms. I am not
> >> > new to Python, however. Thanks for your help.
>
> >> Unix has the "strings" commandline tool to extract text from a binary.
> >> Get hold of a copy of the MinGW tools if you are on windows.
>
> >> Peter
>
> > Okay. Thanks for the guidance. I have a machine with Linux, so I
> > should be able to do what you describe above. Could Python extract the
> > strings from the binary as well? Just wondering.
>
> As a special service for you here is a naive implementation to build upon:
>
> #!/usr/bin/env python
> import sys
>
> wanted_chars = ["\0"]*256
> for i in range(32, 127):
>     wanted_chars[i] = chr(i)
> wanted_chars[ord("\t")] = "\t"
> wanted_chars = "".join(wanted_chars)
>
> THRESHOLD = 4
>
> for s in sys.stdin.read().translate(wanted_chars).split("\0"):
>     if len(s) >= THRESHOLD:
>         print s
>
> Peter- Hide quoted text -
>
> - Show quoted text -

Perfect! Thanks.



More information about the Python-list mailing list