[Numpy-discussion] How to limit the numpy.memmap's RAM usage?

Sat Oct 23 13:39:42 EDT 2010

On Sat, Oct 23, 2010 at 10:27 AM, braingateway <braingateway at gmail.com>wrote:

> Charles R Harris :
> >
> >
> > On Sat, Oct 23, 2010 at 10:15 AM, Charles R Harris
> > <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>> wrote:
> >
> >
> >
> >     On Sat, Oct 23, 2010 at 9:44 AM, braingateway
> >     <braingateway at gmail.com <mailto:braingateway at gmail.com>> wrote:
> >
> >         David Cournapeau :
> >
> >             2010/10/23 braingateway <braingateway at gmail.com
> >             <mailto:braingateway at gmail.com>>:
> >
> >
> >                 Hi everyone,
> >                 I noticed the numpy.memmap using RAM to buffer data
> >                 from memmap files.
> >                 If I get a 100GB array in a memmap file and process it
> >                 block by block,
> >                 the RAM usage is going to increasing with the process
> >                 running until
> >                 there is no available space in RAM (4GB), even though
> >                 the block size is
> >                 only 1MB.
> >                 for example:
> >                 ####
> >                 a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
> >                 blocklen=1e5
> >                 b=npy.zeros((len(a)/blocklen,))
> >                 for i in range(0,len(a)/blocklen):
> >                 b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
> >                 ####
> >                 Is there any way to restrict the memory usage in
> >                 numpy.memmap?
> >
> >
> >
> >             The whole point of using memmap is to let the OS do the
> >             buffering for
> >             you (which is likely to do a better job than you in many
> >             cases). Which
> >             OS are you using ? And how do you measure how much memory
> >             is taken by
> >             numpy for your array ?
> >
> >             David
> >             _______________________________________________
> >
> >
> >         Hi David,
> >
> >         I agree with you about the point of using memmap. That is why
> >         the behavior is so strange to me.
> >         I actually measure the size of resident set (pink trace in
> >         figure2) of the python process on Windows. Here I attached the
> >          result. You can see the  RAM  usage is definitely not file
> >         system cache.
> >
> >
> >     Umm, a good operating system will use *all* of ram for buffering
> >     because ram is fast and it assumes you are likely to reuse data
> >     you have already used once. If it needs some memory for something
> >     else it just writes a page to disk, if dirty, and reads in the new
> >     data from disk and changes the address of the page. Where you get
> >     into trouble is if pages can't be evicted for some reason. Most
> >     modern OS's also have special options available for reading in
> >     streaming data from disk that can lead to significantly faster
> >     access for that sort of thing, but I don't think you can do that
> >     with memmapped files.
> >
> >     I'm not sure how windows labels it's memory. IIRC, Memmaping a
> >     file leads to what is called file backed memory, it is essentially
> >     virtual memory. Now, I won't bet my life that there isn't a
> >     problem, but I think a misunderstanding of the memory information
> >     is more likely.
> >
> >
> > It is also possible that something else in your program is hanging
> > onto memory but without knowing a lot more it is hard to tell. Are you
> > seeing symptoms besides the memory graphs? It looks like you aren't
> > running on windows, actually, so what OS are you running on?
> >
> > Chuck
> > ------------------------------------------------------------------------
> >
> >
> Hi Chuck,
>
> Thanks a lot for quick response. I do run following supper simple script
> on windows:
>
> ####
> a = numpy.memmap(‘a.bin’, dtype='float64', mode='r')
> blocklen=1e5
> b=npy.zeros((len(a)/blocklen,))
> for i in range(0,len(a)/blocklen):
> b[i]=npy.mean(a[i*blocklen:(i+1)*blocklen])
> ####
> Everything became supper slow after python ate all the RAM.
> By the way, I also tried Qt QFile::map() there is no problem at all...
>
>
Hmm. Nothing looks suspicious. For reference, can you be specific about the
OS/version, python version, and numpy version?

What happens if you simply do
for i in range(0,len(a)/blocklen):
     a[i*blocklen:(i+1)*blocklen].copy()

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101023/8b30dd1a/attachment.html>