[pypy-commit] pypy default: (arigo, fijal) improve the doc

fijal pypy.commits at gmail.com
Thu Feb 15 09:58:44 EST 2018


Author: fijal
Branch: 
Changeset: r93825:bb02514372a2
Date: 2018-02-15 15:58 +0100
http://bitbucket.org/pypy/pypy/changeset/bb02514372a2/

Log:	(arigo, fijal) improve the doc

diff --git a/pypy/doc/gc_info.rst b/pypy/doc/gc_info.rst
--- a/pypy/doc/gc_info.rst
+++ b/pypy/doc/gc_info.rst
@@ -15,14 +15,41 @@
 processes) and cache sizes you might want to experiment with it via
 *PYPY_GC_NURSERY* environment variable. When the nursery is full, there is
 performed a minor collection. Freed objects are no longer referencable and
-just die, without any effort, while surviving objects from the nursery
-are copied to the old generation. Either to arenas, which are collections
-of objects of the same size, or directly allocated with malloc if they're big
-enough.
+just die, just by not being referenced any more; on the other hand, objects
+found to still be alive must survive and are copied from the nursery
+to the old generation. Either to arenas, which are collections
+of objects of the same size, or directly allocated with malloc if they're
+larger.  (A third category, the very large objects, are initially allocated
+outside the nursery and never move.)
 
 Since Incminimark is an incremental GC, the major collection is incremental,
 meaning there should not be any pauses longer than 1ms.
 
+
+Fragmentation
+-------------
+
+Before we discuss issues of "fragmentation", we need a bit of precision.
+There are two kinds of related but distinct issues:
+
+* If the program allocates a lot of memory, and then frees it all by
+  dropping all references to it, then we might expect to see the RSS
+  to drop.  (RSS = Resident Set Size on Linux, as seen by "top"; it is an
+  approximation of the actual memory usage from the OS's point of view.)
+  This might not occur: the RSS may remain at its highest value.  This
+  issue is more precisely caused by the process not returning "free"
+  memory to the OS.  We call this case "unreturned memory".
+
+* After doing the above, if the RSS didn't go down, then at least future
+  allocations should not cause the RSS to grow more.  That is, the process
+  should reuse unreturned memory as long as it has got some left.  If this
+  does not occur, the RSS grows even larger and we have real fragmentation
+  issues.
+
+
+gc.get_stats
+------------
+
 There is a special function in the ``gc`` module called
 ``get_stats(memory_pressure=False)``.
 
@@ -56,19 +83,32 @@
     
 In this particular case, which is just at startup, GC consumes relatively
 little memory and there is even less unused, but allocated memory. In case
-there is a high memory fragmentation, the "allocated" can be much higher
-than "used". Generally speaking, "peak" will more resemble the actual
-memory consumed as reported by RSS, since returning memory to the OS is a hard
-and not solved problem.
+there is a lot of unreturned memory or actual fragmentation, the "allocated"
+can be much higher than "used".  Generally speaking, "peak" will more closely
+resemble the actual memory consumed as reported by RSS.  Indeed, returning
+memory to the OS is a hard and not solved problem.  In PyPy, it occurs only if
+an arena is entirely free---a contiguous block of 64 pages of 4 or 8 KB each.
+It is also rare for the "rawmalloced" category, at least for common system
+implementations of ``malloc()``.
 
 The details of various fields:
 
-* GC in arenas - small old objects held in arenas. If the amount of allocated
-  is much higher than the amount of used, we have large fragmentation issue
+* GC in arenas - small old objects held in arenas. If the amount "allocated"
+  is much higher than the amount "used", we have unreturned memory.  It is
+  possible but unlikely that we have internal fragmentation here.  However,
+  this unreturned memory cannot be reused for any ``malloc()``, including the
+  memory from the "rawmalloced" section.
 
-* GC rawmalloced - large objects allocated with malloc. If this does not
-  correspond to the amount of RSS very well, consider using jemalloc as opposed
-  to system malloc
+* GC rawmalloced - large objects allocated with malloc.  This is gives the
+  current (first block of text) and peak (second block of text) memory
+  allocated with ``malloc()``.  The amount of unreturned memory or
+  fragmentation caused by ``malloc()`` cannot easily be reported.  Usually
+  you can guess there is some if the RSS is much larger than the total
+  memory reported for "GC allocated", but do keep in mind that this total
+  does not include malloc'ed memory not known to PyPy's GC at all.  If you
+  guess there is some, consider using `jemalloc`_ as opposed to system malloc.
+
+.. _`jemalloc`: http://jemalloc.net/
 
 * nursery - amount of memory allocated for nursery, fixed at startup,
   controlled via an environment variable
@@ -91,7 +131,7 @@
 
 ``PYPY_GC_NURSERY``
     The nursery size.
-    Defaults to 1/2 of your cache or ``4M``.
+    Defaults to 1/2 of your last-level cache, or ``4M`` if unknown.
     Small values (like 1 or 1KB) are useful for debugging.
 
 ``PYPY_GC_NURSERY_DEBUG``


More information about the pypy-commit mailing list