[Python-ideas] Threading hooks and disable gc per thread

Christian Heimes lists at cheimes.de
Sun May 15 03:04:28 CEST 2011


Am 14.05.2011 21:21, schrieb Gregory P. Smith:
> Makes sense to me.
> 
> Something that needs clarifying: when the process dies (main python
> thread has exited and all remaining python threads are daemon threads)
> the on thread end hook will _not_ be called.

Good catch! This gotcha should be mentioned in the docs. A daemon thread
can end at any point in its life cycle. It's not an issue for my use
case. For JCC the hook just frees some resources that are freed anyway
when the process ends. Other use cases may need a more deterministic
cleanup, but that's out of the scope for my proposal. Users can get
around the issue with an atexit hook, though.

> This also sounds useful since we are a long long way from concurrent
> gc.  (and whenever we gain that, we'd need a way to control when it
> can or can't happen or to register the gc threads with the anything
> that needs to know about 'em, JCC, etc..)

I though of a concurrent GC, too. A dedicated GC thread could improve
response time of a GUI or web application if we could separate the
cyclic garbage detection into two steps. Even on a fast machine, a full
GC sweep with millions of objects in gen2 can take a long time up to a
second, in which the interpreter is locked. I assume that the scanning a
million objects takes most of the time. If it would be possible to have
a scan without the GIL held and then remove the objects in a second step
with the GIL acquired, response time could increase. However that would
require a major redesign of the traverse and visit slots.

Back to my proposal. My initial proposal was missing one feature. It
should be possible to alter the default setting for
PyThreadState->gc_enabled, too. JCC could use the additional API to make
sure, non attached threads don't run the GC.

Example how JCC could use the feature:
lucene.initVM() initializes the Java VM and attaches the current thread.
This is usually done in the main thread before any other thread is
started. The function would call PyThread_set_gc_enabled(0) to set the
default value for new thread states and to prevent any new thread from
starting a cyclic GC collect.

lucene.getVM().attachCurrentThread() creates some thread local objects
in a TLS and registers the current thread at the Java VM. This would run
PyObject_GC_set_thread_enabled(1) to allow GC collect in the current thread.

lucene.getVMEnv().detachCurrentThread() cleans up the TLS and
unregisters the thread, so a PyObject_GC_set_thread_enabled(0) is required.

The implementation is rather simple:
 - a new static int variable for the default setting and a new flag in
the PyThreadState struct
 - check PyThreadState_Get()->gc_enabled in _PyObject_GC_Malloc()
 - four small functions to set and get the default and thread setting
 - three Python functions in the gc module to enable, disable and get
the flag from the current PyThreadState
 - a function to get the global flag. I'm not sure if we should expose
the global switch for Python code.

The attached patch already has all C functionality. If I hear more +1,
then I'll write two small PEPs for both feature requests.

Christian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc_thread.diff
Type: text/x-patch
Size: 3331 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20110515/e139e34b/attachment.bin>


More information about the Python-ideas mailing list