Populating huge data structures from disk
Neil Cerutti
horpner at yahoo.com
Tue Nov 6 16:43:26 EST 2007
On 2007-11-06, Michael Bacarella <mbac at gpshopper.com> wrote:
> And there's no solace in lists either:
>
> $ time python eat800.py
>
> real 4m2.796s
> user 3m57.865s
> sys 0m3.638s
>
> $ cat eat800.py
> #!/usr/bin/python
>
> import struct
>
> d = []
> f = open('/dev/zero')
> for i in xrange(100000000):
> d.append(struct.unpack('L',f.read(8))[0])
>
>
> cPickle with protocol 2 has some promise but is more complicated because
> arrays can't be pickled. In a perfect world I could do something like this
> somewhere in the backroom:
>
> x = lengthy_number_crunching()
> magic.save_mmap("/important-data")
>
> and in the application do...
>
> x = magic.mmap("/important-data")
> magic.mlock("/important-data")
>
> and once the mlock finishes bringing important-data into RAM, at
> the speed of your disk I/O subsystem, all accesses to x will be
> hits against RAM.
>
>
> Any thoughts?
Disable the garbage collector, use a while loop and manual index
instead of an iterator, preallocate your list, e.g.,
[None]*100000000, and hope they don't have blasters!
--
Neil Cerutti
More information about the Python-list
mailing list