[Python-ideas] Ideas towards GIL removal
Talin
talin at acm.org
Sun Apr 15 19:12:04 CEST 2007
Greg Ewing wrote:
> 2) Objects owned by a thread
>
> Python code creates and destroys temporary objects
> at a high rate -- stack frames, argument tuples,
> intermediate results, etc. If the code is executed
> by a thread, those objects are rarely if ever seen
> outside of that thread. It would be beneficial if
> refcount operations on such objects could be carried
> out by the thread that created them without locking.
While we are on the topic of reference counting, I'd like to direct your
attention to Recycler, an IBM research project:
"The Recycler is a concurrent multiprocessor garbage collector with
extremely low pause times (maximum of 6 milliseconds over eight
benchmarks) while remaining competitive with the best throughput-
oriented collectors in end-to-end execution times. This paper describes
the overall architecture of the Recycler, including its use of reference
counting and concurrent cycle collection, and presents extensive
measurements of the system comparing it to a parallel, stop-the-world
mark-and-sweep collector."
There are a bunch of research papers describing Recycler which can be
found at the following link:
http://www.research.ibm.com/people/d/dfb/publications.html
I'd start with the papers entitled "Java without the Coffee Breaks: A
Non-intrusive Multiprocessor Garbage Collector" and "Concurrent Cycle
Collection in Reference Counted Systems".
Let me describe a bit about how the Recycler works and how it relates to
what you've proposed.
The basic idea is that for each thread, there is a set of thread local
data (TLD) that contains a pair of "refcount buffers", one buffer for
increfs and one buffer for decrefs. Each refcount buffer is a flat array
of pointers which starts empty and gradually fills up.
The "incref" operation does not actually touch the reference count field
of the object. Instead, an "incref" appends a pointer to the object to
the end of the incref buffer for that thread. Similarly, a decref
operation appends a pointer to the object to the decref buffer. Since
the refcount buffers are thread-local, there is no need for locking or
synchronization.
When one of the buffers gets full, both buffers are swapped out for new
ones, and the old buffers are placed on a queue which is processed by
the collector thread. The collector thread is the only thread which is
allowed to actually touch the reference counts of the individual
objects, and its the only thread which is allowed to delete objects.
Processing the buffers is relatively simple: First, the incref buffer is
processed. The collector thread scans through each pointer in the
buffer, and increments the refcount of each object. Then the decref
buffer is processed in a similar way, decrementing the refcount.
However, it also needs to process the buffers for the other threads
before the memory can be reclaimed. Recycler defines a series of
"epochs" (i.e. intervals between collections). Within a refcount buffer,
each epoch is represented as a contiguous range of values within the
array -- all of the increfs and decrefs which occurred during that
epoch. An auxiliary array records the high water mark for each epoch.
Using this information, the collector thread is able to process only
those increfs and decrefs for all threads which occurred before the
current epoch. This does mean that objects whose refcount falls to zero
during the current epoch will not be deleted until the next collection
cycle.
Recycler also handles cyclic garbage via cycle detection, which is
described in the paper. It does not use a "mark and sweep" type of
algorithm, but is instead able to detect cycles locally without scanning
the entire heap.
Thus, the Recycler's use of refcount buffers achieves what you were
trying to achieve, which is refcounting without locking. However, it
does require access to thread-local data for each incref / release
operation. The performance of this scheme will greatly depend on how
quickly the code can get access to thread local data. The fastest
possible method would be to dedicate a register, but that's infeasible
on most systems. Another idea is for large functions to look up the TLD
and stuff it in a local variable at the beginning of the function. For
older source code the existing, backwards-compatible incref and decref
macros could each individually get access to the TLD, but these would be
slower than the more optimized methods in which the TLD was supplied as
a parameter.
-- Talin
More information about the Python-ideas
mailing list