numpy.memmap advice?

Lionel lionel.keene at gmail.com
Thu Feb 19 12:34:43 EST 2009


On Feb 18, 12:35 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Feb 18, 10:48 am, Lionel <lionel.ke... at gmail.com> wrote:
>
> > Thanks Carl, I like your solution. Am I correct in my understanding
> > that memory is allocated at the slicing step in your example i.e. when
> > "reshaped_data" is sliced using "interesting_data = reshaped_data[:,
> > 50:100]"? In other words, given a huge (say 1Gb) file, a memmap object
> > is constructed that memmaps the entire file. Some relatively small
> > amount of memory is allocated for the memmap operation, but the bulk
> > memory allocation occurs when I generate my final numpy sub-array by
> > slicing, and this accounts for the memory efficiency of using memmap?
>
> No, what accounts for the memory efficienty is there is no bulk
> allocation at all.  The ndarray you have points to the memory that's
> in the mmap.  There is no copying data or separate array allocation.
>
> Also, it's not any more memory efficient to use the offset parameter
> with numpy.memmap than it is to memmap the whole file and take a
> slice.
>
> Carl Banks

Does this mean that everytime I iterate through an ndarray that is
sourced from a memmap, the data is read from the disc? The sliced
array is at no time wholly resident in memory? What are the
performance implications of this?



More information about the Python-list mailing list