Populating huge data structures from disk

Tue Nov 6 16:43:26 EST 2007

On 2007-11-06, Michael Bacarella <mbac at gpshopper.com> wrote:
> And there's no solace in lists either:
>  
> $ time python eat800.py 
>
> real    4m2.796s
> user    3m57.865s
> sys     0m3.638s
>
> $ cat eat800.py 
> #!/usr/bin/python
>
> import struct
>
> d = []
> f = open('/dev/zero')
> for i in xrange(100000000):
>         d.append(struct.unpack('L',f.read(8))[0])
>
>
> cPickle with protocol 2 has some promise but is more complicated because
> arrays can't be pickled.  In a perfect world I could do something like this
> somewhere in the backroom:
>
> x = lengthy_number_crunching()
> magic.save_mmap("/important-data")
>
> and in the application do...
>
> x = magic.mmap("/important-data")
> magic.mlock("/important-data")
>
> and once the mlock finishes bringing important-data into RAM, at
> the speed of your disk I/O subsystem, all accesses to x will be
> hits against RAM.
>  
>
> Any thoughts?

Disable the garbage collector, use a while loop and manual index
instead of an iterator, preallocate your list, e.g.,
[None]*100000000, and hope they don't have blasters!

-- 
Neil Cerutti