Finding a text in raw data(size nearly 10GB) and Printing its memory address using python

Chris Angelico rosuav at gmail.com
Mon Apr 23 13:31:14 EDT 2018


On Tue, Apr 24, 2018 at 3:24 AM, Hac4u <samakshkaushik at gmail.com> wrote:
> I have a raw data of size nearly 10GB. I would like to find a text string and print the memory address at which it is stored.
>
> This is my code
>
> import os
> import re
> filename="filename.dmp"
> read_data=2**24
> searchtext="bd:mongo:"
> he=searchtext.encode('hex')

Why encode it as hex?

> with open(filename, 'rb') as f:
>     while True:
>         data= f.read(read_data)
>         if not data:
>             break
>         elif searchtext in data:
>             print "Found"
>             try:
>                 offset=hex(data.index(searchtext))
>                 print offset
>             except ValueError:
>                 print 'Not Found'
>         else:
>             continue

You have a loop that reads a slab of data from a file, then searches
the current data only. Then you search that again for the actual
index, and print it - but you're printing the offset within the
current chunk only. You'll need to maintain a chunk position in order
to get the actual offset.

Also, you're not going to find this if it spans across a chunk
boundary. May need to cope with that.

ChrisA



More information about the Python-list mailing list