mmap caching

George Sakkis george.sakkis at gmail.com
Sun Jan 21 16:32:19 EST 2007


Nick Craig-Wood wrote:

> George Sakkis <george.sakkis at gmail.com> wrote:
> >  I've been trying to track down a memory leak (which I initially
> >  attributed erroneously to numpy) and it turns out to be caused by a
> >  memory mapped file. It seems that mmap caches without limit the chunks
> >  it reads, as the memory usage grows to several hundreds MBs according
> >  to the Windows task manager before it dies with a MemoryError. I'm
> >  positive that these chunks are not referenced anywhere else; in fact if
> >  I change the mmap object to a normal file, memory usage remains
> >  constant. The documentation of mmap doesn't mention anything about
> >  this. Can the caching strategy be modified at the user level ?
>
> I'm not familiar with mmap() on windows, but assuming it works the
> same way as unix...
>
> The point of mmap() is to map files into memory.  It is completely up
> to the OS to bring pages into memory for you to read / write to, and
> completely up to the OS to get rid of them again.
>
> What you would expect is that the file is demand paged into memory as
> you access bits of it.  These pages will remain in memory until the OS
> feels some memory pressure when the pages will be written out if dirty
> and then dropped.
>
> The OS will try to keep hold of pages as long as possible just in case
> you need them again.  The pages dropped should be the least recently
> used pages.
>
> I wouldn't have expected a MemoryError though...
>
> Did you do mmap.flush() after writing?

The file is written once and then opened as read-only, there's no
flushing. So if caching is completely up to the OS, I take it that my
options are either (1) modify my algorithms so that they work in
fixed-size batches instead of arbitrarily long sequences or (2)
implement my own memory-mapping scheme to fit my algorithms. I guess
(1) would be the less trouble overall, or is there a way to give a hint
to the OS on how large cache can it use ?

George




More information about the Python-list mailing list