Finding a text in raw data(size nearly 10GB) and Printing its memory address using python

Grant Edwards grant.b.edwards at gmail.com
Mon Apr 23 16:10:12 EDT 2018


On 2018-04-23, Hac4u <samakshkaushik at gmail.com> wrote:

> I have a raw data of size nearly 10GB. I would like to find a text
> string and print the memory address at which it is stored.

The first thing I would try is to map the file into memory as a string
(Pythonb 2) or bytearray (Python 3), and then search it using the
find() method to search it.

My theory is you let the OS worry about shuffling blocks between RAM
and disk -- it's pretty good at that (well, Linux is anyway).

#!/usr/bin/python3
import sys,mmap,os
fn = os.open(sys.argv[1],os.O_RDONLY)
mm = mmap.mmap(fn,0,prot=mmap.PROT_READ)
i = mm.find(bytes(sys.argv[2],encoding='UTF-8'))
print(i)


The above code works for me, but I don't know how perfomance compares
with other methods.  I think the mmap() signature is slightly
different on Windows.

-- 
Grant Edwards               grant.b.edwards        Yow! Not SENSUOUS ... only
                                  at               "FROLICSOME" ... and in
                              gmail.com            need of DENTAL WORK ... in
                                                   PAIN!!!




More information about the Python-list mailing list