memory leak with large list??

Tim Peters tim.one at comcast.net
Sat Jan 25 07:41:42 EST 2003


[someone]
>> Then I generate a large list (12 million floats), either using map,
>> a list comprehension, or preallocating a list, then filling it using a
>> for loop.

[Terry Reedy]
> 12 million different floats use 12 million * sizeof float object
> bytes. Float object is at least 8 bytes for float + object overhead
> (about 4 bytes?)

Each object has at least a pointer to the object's type object, and a
refcount, for at least 8 bytes of overhead.  So a float object consumes at
least 16 bytes.

> list has array of 12 millions pointers (4 (or possibly 8) bytes).  Thus
> at least  be 16 * 12 million, roughly 200 million bytes, regardless of
> method.

At least 192 million for the float objects, and at least 48 million for the
list.

>> When I check the memory footprint again, it is larger than I would
>> expect, but I'm not sure what the per element overhead of a list is.
>>
>> %MEM  RSS  RSS    SZ   VSZ
>> 46.2 237648 237648 59690 238760


> If 237648 is # of Kbytes, it is roughly what should expect.

Yup.

>> So then delete it.
>>
>> >>> del(big_list)
>>
>> But the memory footprint of the process remains very large:
>>
>> %MEM  RSS  RSS    SZ   VSZ
>> 37.1 190772 190772 47971 191884

> Deleting frees mem for reuse by Python but does not necessarily return
> memory to system for reuse by other processes.  (Exact behavior is
> *very* system specific, according to Tim Peters' war stories.)

Its worse in this case:  int and float objects come out of special internal
type-specific "free lists", and there's no bound on how large those can get.
Here's the deallocator for floats:

static void
float_dealloc(PyFloatObject *op)
{
	if (PyFloat_CheckExact(op)) {
		op->ob_type = (struct _typeobject *)free_list;
		free_list = op;
	}
	else
		op->ob_type->tp_free((PyObject *)op);
}

IOW, memory once allocated for a float can never be reused for any other
kind of object, and isn't returned to the platform C library until Python
shuts down.  The list memory did get returned to the platform C library, and
in this case it looks like the latter did return that chunk to the OS.  If
the OP had allocated some other "large object" after allocating the list,
chances are good that the platform C would not have returned the list memory
to the OS.  Even then, it doesn't much matter -- the OS will simply page out
that part of the address space if it's not used again.  The VM highwater
mark doesn't have a primary effect on performance.






More information about the Python-list mailing list