[Patches] Fix for memory leak in _PyResize_Tuple

Charles G Waldman cgw@fnal.gov
Sun, 16 Apr 2000 21:19:19 -0500 (CDT)


Jeremy Hylton writes:
 > 
 > Question about this patch: Is it always more efficient to re-use a
 > tuple (and update all the ob_item pointers)?  Or are there cases where
 > it would be more efficient to realloc and avoid the cost of updating
 > the pointers?

According to Webster's, "efficient" has two meanings:
1: immediately effecting 
2: productive of desired effects; esp : productive without waste 


Realloc might be (?) more efficient in sense 1 (speed), but certainly
not in sense 2 (waste).  It just seems ecologically wrong to be
keeping this heap of used tuples around for recycling, then going to
the store and buying a new one when your old one doesn't fit any more,
instead of looking on the recycling heap first.  If speed is a great
concern, the updating of the ob_item pointers could be done with a
memcpy (I thought about doing it this way but the for-loop seemed
clearer).

 > There's another problem with the current strategy for reusing tuples
 > that we might fix at the same time.  If a program creates a large
 > number of tuples, later releases them all, and never creates any more
 > tuples of the same size, that memory will live forever on the free
 > list.  If I have a program that creates a million 3-tuples at one
 > blow, that memory will never get freed.

Making a limit on the size of the recycling heap is a different issue
and maybe that should be done as well, but I don't think that's a
reason not to fix _PyTuple_Resize.

I thought about adding this too, but chose not to for the following 4
reasons: (1) I have to choose an arbitrary value for the cut-off and I
don't like to impose arbitrary limits.  Do you want to keep 42 free
tuples?  137?  1024?  Somebody else come up with the number and I'll
put that in too, except that (2) I am not really 100% sure that the
behavior you describe is a problem.  Maybe the function that creates
the million 3-tuples will run again and need to create another million
3-tuples and having them all cached will be advantageous.  Or maybe
not.  But (4) I have been thinking about memory leaks and "ecology"
and it seems to me to be a desirable property that you be able to call
a function f() then call it again and again and not use up more memory
on each call.  (Of course I'm assuming you use the same parameters and
f isn't deliberately constructed to consume memory...)  To me,
continuing to leak a little bit of memory on each call is worse than
doing a huge alloc one time.... it's OK if the high-water mark goes up
way high and doesn't recede, as long as it doesn't keep inching up.

When I iterate the tests in the Python Lib/test package and plot the
interpreter's memory footprint vs. iteration number, I see two
distinct patterns - a monotone increasing ramp, or a "mesa".  If I see
the ramp, I know right away that there is leaky behavior, and I can
try to fix it.  (Right now I see this with the modules test_cpickle,
test_extcall, test_nis, and test_unicode, and I hope to fix all of
these leaks, time permitting).

With a limit on the number of tuples in the freelist, there would be a
third type of pattern - a ramp which flattens off when a hard-limit is
reached.  This seems like it would be harder to detect if you are
doing memory-leak testing.

 > It should be possible to fix this problem by keeping track of the size
 > of each freelist and never letting it exceed some maximum size.  If it
 > actually ends up being more efficient to realloc than copy, then this
 > patch would limit the amount of loss caused by oddball cases like
 > calling PyObject_Sequence on an instance that overstates its size.

Yes, it's an oddball case, but it came from Python's own test suite!
(I'm not devious enough to have come up with this one, I don't think).
Ideally, no matter what you throw at it, you should be able to get
Python to exhibit signs of insidious memory leaks, especially not
running the built-in test suite.  "Limiting the loss" is less elegant
than not allowing it to happen in the first place.  If the limit was
1024 I could run the "text_extcall" test 1000 times and think I saw a
memory leak.