Why GIL? (was Re: what's the point of rpython?)

Rhamphoryncus rhamph at gmail.com
Fri Jan 23 14:01:24 EST 2009


On Jan 22, 11:09 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> On Jan 22, 9:38 pm, Rhamphoryncus <rha... at gmail.com> wrote:
>
>
>
> > On Jan 22, 9:38 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
>
> > > On Jan 22, 6:00 am, a... at pythoncraft.com (Aahz) wrote:
>
> > > > In article <7xd4ele060.... at ruckus.brouhaha.com>,
> > > > Paul Rubin  <http://phr...@NOSPAM.invalid> wrote:
>
> > > > >alex23 <wuwe... at gmail.com> writes:
>
> > > > >> Here's an article by Guido talking about the last attempt to remove
> > > > >> the GIL and the performance issues that arose:
>
> > > > >> "I'd welcome a set of patches into Py3k *only if* the performance for
> > > > >> a single-threaded program (and for a multi-threaded but I/O-bound
> > > > >> program) *does not decrease*."
>
> > > > >The performance decrease is an artifact of CPython's rather primitive
> > > > >storage management (reference counts in every object).  This is
> > > > >pervasive and can't really be removed.  But a new implementation
> > > > >(e.g. PyPy) can and should have a real garbage collector that doesn't
> > > > >suffer from such effects.
>
> > > > CPython's "primitive" storage management has a lot to do with the
> > > > simplicity of interfacing CPython with external libraries.  Any solution
> > > > that proposes to get rid of the GIL needs to address that.
>
> > > I recently was on a long road trip, and was not driver, and with
> > > nothing better to do thought quite a bit about how this.
>
> > > I concluded that, aside from one major trap, it wouldn't really be
> > > more difficult to inteface Python to external libraries, just
> > > differently difficult.  Here is briefly what I came up with:
>
> > > 1. Change the singular Python type into three metatypes:
> > > immutable_type, mutable_type, and mutable_dict_type.  (In the latter
> > > case, the object itself is immutable but the dict can be modified.
> > > This, of course, would be the default metaclass in Python.)  Only
> > > mutable_types would require a mutex when accessing.
>
> > > 2. API wouldn't have to change much.  All regular API would assume
> > > that objects are unlocked (if mutable) and in a consistent state.
> > > It'll lock any mutable objects it needs to access.  There would also
> > > be a low-level API that assumes the objects are locked (if mutable)
> > > and does not require objects to be consistent.  I imagine most
> > > extensions would call the standard API most of the time.
>
> > > 3. If you are going to use the low-level API on a mutable object, or
> > > are going to access the object structure directly, you need to acquire
> > > the object's mutex. Macros such as Py_LOCK(), Py_LOCK2(), Py_UNLOCK()
> > > would be provided.
>
> > > 4. Objects would have to define a method, to be called by the GC, that
> > > marks every object it references.  This would be a lot like the
> > > current tp_visit, except it has to be defined for any object that
> > > references another object, not just objects that can participate in
> > > cycles.  (A conservative garbage collector wouldn't suffice for Python
> > > because Python quite often allocates blocks but sets the pointer to an
> > > offset within the block.  In fact, that's true of almost any Python-
> > > defined type.)  Unfortunately, references on the stack would need to
> > > be registered as well, so "PyObject* p;" might have to be replaced
> > > with something like "Py_DECLARE_REF(PyObject,p);" which magically
> > > registers it.  Ugly.
>
> > > 5. Py_INCREF and Py_DECREF are gone.
>
> > > 6. GIL is gone.
>
> > > So, you gain the complexity of a two-level API, having to lock mutable
> > > objects sometimes, and defining more visitor methods than before, but
> > > you don't have to keep INCREFs and DECREFs straight, which is no small
> > > thing.
>
> > > The major trap is the possibily of deadlock.  To help minimize the
> > > risk there would be macros to lock multiple objects at once.  Py_LOCK2
> > > (a,b), which guarantess that if in another thread is calling Py_LOCK2
> > > (b,a) at the same time, it won't result in a deadlock.  What's
> > > disappointing is that the deadlocking possibility is always with you,
> > > much like the reference counts are.
>
> > IMO, locking of the object is a secondary problem.  Python-safethread
> > provides one solution, but it's not the only conceivable one.  For the
> > sake of discussion it's easier to assume somebody else is solving it
> > for you.
>
> That assumption might be good for the sake of the discussion *you*
> want to have, but it's not for discussion I was having, which was to
> address Aahz's claim that GIL makes extension writing simple by
> presenting a vision of what Python might be like if it had a mark-and-
> sweep collector.  The details of the GC are a small part of that and
> wouldn't affect my main point even if they are quite different than I
> described.  Also, extension writers would have to worry about locking
> issues here, so it's not acceptable to assume somebody else will solve
> that problem.
>
> > Instead, focus on just the garbage collection.
>
> [snip rest of threadjack]
>
> You can ignore most of what I was talking about and focus on
> technicalities of garbage collection if you want to.  I will not be
> joining you in that discussion, however.
>
> Carl Banks

I'm sorry, you're right, I misunderstood your context.



More information about the Python-list mailing list