Garbage collection

Steven D'Aprano steve at REMOVE.THIS.cybersource.com.au
Wed Mar 21 12:59:38 EDT 2007


On Wed, 21 Mar 2007 15:32:17 +0000, Tom Wright wrote:

>> Memory contention would be a problem if your Python process wanted to keep
>> that memory active at the same time as you were running GIMP.
> 
> True, but why does Python hang on to the memory at all?  As I understand it,
> it's keeping a big lump of memory on the int free list in order to make
> future allocations of large numbers of integers faster.  If that memory is
> about to be paged out, then surely future allocations of integers will be
> *slower*, as the system will have to:
> 
> 1) page out something to make room for the new integers
> 2) page in the relevant chunk of the int free list
> 3) zero all of this memory and do any other formatting required by Python
> 
> If Python freed (most of) the memory when it had finished with it, then all
> the system would have to do is:
> 
> 1) page out something to make room for the new integers
> 2) zero all of this memory and do any other formatting required by Python
> 
> Surely Python should free the memory if it's not been used for a certain
> amount of time (say a few seconds), as allocation times are not going to be
> the limiting factor if it's gone unused for that long.  Alternatively, it
> could mark the memory as some sort of cache, so that if it needed to be
> paged out, it would instead be de-allocated (thus saving the time taken to
> page it back in again when it's next needed)

And increasing the time it takes to re-create the objects in the cache
subsequently.

Maybe this extra effort is worthwhile when the free int list holds 10**7
ints, but is it worthwhile when it holds 10**6 ints? How about 10**5 ints?
10**3 ints?

How many free ints is "typical" or even "common" in practice?

The lesson I get from this is, instead of creating such an enormous list
of integers in the first place with range(), use xrange() instead.

Fresh running instance of Python 2.5:

$ ps up 9579
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
steve     9579  0.0  0.2   6500  2752 pts/7    S+   03:42   0:00 python2.5


Run from within Python:

>>> n = 0
>>> for i in xrange(int(1e7)):
...     # create lots of ints, one at a time
...     # instead of all at once
...     n += i # make sure the int is used
...
>>> n
49999995000000L


And the output of ps again:

$ ps up 9579
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
steve     9579  4.2  0.2   6500  2852 pts/7    S+   03:42   0:11 python2.5

Barely moved a smidgen.

For comparison, here's what ps reports after I create a single list with
range(int(1e7)), and again after I delete the list:

$ ps up 9579 # after creating list with range(int(1e7))
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
steve     9579  1.9 15.4 163708 160056 pts/7   S+   03:42   0:11 python2.5

$ ps up 9579 # after deleting list
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
steve     9579  1.7 11.6 124632 120992 pts/7   S+   03:42   0:12 python2.5


So there is another clear advantage to using xrange instead of range,
unless you specifically need all ten million ints all at once.



-- 
Steven.




More information about the Python-list mailing list