Trying to understand the memory occupation of big lists

Dave Angel davea at davea.name
Fri May 3 08:16:34 EDT 2013


On 05/03/2013 07:24 AM, Michele Simionato wrote:
> I have a memory leak in a program using big arrays.

Actually, big lists.  Python also has arrays, and they're entirely 
different.

With the goal of debugging it I run into the memory_profiler module. 
Then I discovered something which is surprising to me. Please consider 
the following script:
>
> $ cat memtest.py
> import gc
> from memory_profiler import profile
>
>
> @profile
> def test1():
>      a = [0] * 1024 * 1024
>      del a
>      gc.collect()  # nothing change if I comment this
>
>
> @profile
> def test2():
>      for i in range(10):
>          a = [0] * 1024 * 1024
>          del a
>      gc.collect()  # nothing change if I comment this
>
>
> test1()
> test2()
>
> Here is its output, on a Linux 64 bit machine:
>
> $ python memtest.py
> Filename: memtest.py
>
> Line #    Mem usage    Increment   Line Contents
> ================================================
>       5                             @profile
>       6     9.250 MB     0.000 MB   def test1():
>       7    17.246 MB     7.996 MB       a = [0] * 1024 * 1024
>       8     9.258 MB    -7.988 MB       del a
>       9     9.258 MB     0.000 MB       gc.collect()  # nothing change if I comment this
>
>
> Filename: memtest.py
>
> Line #    Mem usage    Increment   Line Contents
> ================================================
>      12                             @profile
>      13     9.262 MB     0.000 MB   def test2():
>      14    17.270 MB     8.008 MB       for i in range(10):
>      15    17.270 MB     0.000 MB           a = [0] * 1024 * 1024
>      16    17.270 MB     0.000 MB           del a
>      17    17.270 MB     0.000 MB       gc.collect()  # nothing change if I comment this
>
> In the first case the memory is released (even if strangely not
> completely, 7.996 != 7.988), in the second case the memory is not. Why it is so? I did expect gc.collect() to free the memory but it is completely ininfluent. In the second cases there are 10 lists with 8 MB each, so
> 80 MB are allocated and 72 released, but 8 MB are still there apparently.
> It does not look like a problem of mem_profile, this is what observe with
> top too.
>
> Any ideas?
>

I haven't played with profile, so my comments are limited to the direct 
code.

gd.collect() has nothing to do in either of these functions, since the 
memory has already been released by the ref-count logic.  Only in the 
case of a circular reference is the gc.collect() useful.  If you want to 
see gc.collect() in action create two large objects that reference each 
other and a small one that references one of them.  Del the first two 
and then the third, and the memory cannot be released since the ref 
counts are nonzero.  Then do a gc.collect() which will realize that you 
have no way to reference either of the two large objects.

I suspect that profile is only looking at the memory from the point of 
view of the OS.  No block of memory can be released to the OS unless 
it's entirely freed.  My guess is that in the second case the variable i 
(or some other internal one relating to the loop) is in the same block 
with one of those lists.  The point is that CPython uses the C malloc() 
and free() functions, and they have their own limitations.  Most of the 
time when free() is called, the memory is NOT released to the OS, but is 
still made available within Python for future use.


-- 
DaveA



More information about the Python-list mailing list