[Python-Dev] "Fixing" the new GIL

Sun Mar 14 14:54:13 CET 2010

On 3/14/10 7:31 AM, "Nir Aides" <nir at winpdb.org> wrote:

> There are two possible problems with Dave's benchmark:
> 
> 1) On my system setting TCP_NODELAY option on the accepted server socket
> changes results dramatically.
Could you document what you saw and explain how you think TCP_NODELAY makes
a difference, including what kind of system you ran your tests and what the
application was that demonstrates those dramatic results?

> 2) What category of socket servers is dave's spin() function intended to
> simulate?
What is the problem you are trying to get at with this question?

Does Dave¹ spin() function have to have a category? Or perhaps the question
is, for these solutions, what category of application do they hurt? Perhaps
we can look at the solutions as general, but consider their impact in
various categories.

> In a server which involves CPU intensive work in response to a socket request
> the behavior may be significantly different. In such a system, high CPU load
> will significantly reduce socket responsiveness which in turn will reduce CPU
> load and increase socket responsiveness.
Not sure I follow how high CPU load will in turn reduce CPU load. :) Can you
explain more about what you are trying to say here?

> Testing with a modified server that reflects the above indicates the new GIL
> behaves just fine in terms of throughput. So a change to the GIL may not be
> required at all.
Are you saying that a change to the new GIL may not be required at all?

Did your modified server run any worse with any of the proposed changes?
Could you document what you saw?

> There is still the question of latency - a single request which takes long
> time to process will affect the latency of other "small" requests. However, it
> can be argued if such a scenario is practical, or if modifying the GIL is the
> solution.
Perhaps Dave already documented this effect in his visualizations, no?

> If a change is still required, then I vote for the simpler approach - that of
> having a special macro for socket code.
What is that simpler approach? How would that special macro work?

> I remember there was reluctance in the past to repeat the OS scheduling
> functionality and for a good reason.

Folks,

Do we believe that every other application that has contention for one
resource leaves the solution of that problem to the OS scheduler?

In what ways do we consider the CPython interpreter to be different than
another application that has multiple threads and contention for one
resource? Perhaps we have a unique problem among all other user space
applications. Perhaps we don¹t.

Do we see Antoine¹s or Dave¹s proposed solutions as being solutions to a
niche problem? Or are they algorithms for handling one or more of the
behaviors the CPython interpreter will encounter when running the many
python applications across OS/CPU environments?

As for the behavior of the GIL, how are the proposed solutions repeating OS
scheduling functionality? Could we instead look at these solutions as
repeating what other applications have done that have contention for one
resource?

Can anyone show that Antoine¹s proposed solution hurts performance?

-peter

> Nir
> 
> 
> On Sat, Mar 13, 2010 at 11:46 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> 
>> Hello,
>> 
>> As some of you may know, Dave Beazley recently exhibited a situation
>> where the new GIL shows quite a poor behaviour (the old GIL isn't very
>> good either, but still a little better). This issue is followed in
>> http://bugs.python.org/issue7946
>> 
>> This situation is when an IO-bound thread wants to process a lot of
>> incoming packets, while one (or several) CPU-bound thread is also
>> running. Each time the IO-bound thread releases the GIL, the CPU-bound
>> thread gets it and keeps holding it for at least 5 milliseconds
>> (default setting), which limits the number of individual packets which
>> can be recv()'ed and processed per second.
>> 
>> I have proposed two mechanisms, based on the same idea: IO-bound
>> threads should be able to steal the GIL very quickly, rather than
>> having to wait for the whole "thread switching interval" (again, 5 ms
>> by default). They differ in how they detect an "IO-bound threads":
>> 
>> - the first mechanism is actually the same mechanism which was
>>   embodied in the original new GIL patch before being removed. In this
>>   approach, IO methods (such as socket.read() in socketmodule.c)
>>   releasing the GIL must use a separate C macro when trying to get the
>>   GIL back again.
>> 
>> - the second mechanism dynamically computes the "interactiveness" of a
>>   thread and allows interactive threads to steal the GIL quickly. In
>>   this approach, IO methods don't have to be modified at all.
>> 
>> Both approaches show similar benchmark results (for the benchmarks
>> that I know of) and basically fix the issue put forward by Dave Beazley.
>> 
>> Any thoughts?
>> 
>> Regards
>> 
>> Antoine.
>> 
>> 
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>> http://mail.python.org/mailman/options/python-dev/nir%40winpdb.org
> 
> 
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20100314/9d5527c4/attachment-0001.html>