Future of Pypy?

Paul Rubin no.email at nospam.invalid
Mon Feb 23 20:47:32 EST 2015


Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> Deadlocks, livelocks, and the simple fact that debugging threaded code is 
> much, much, much harder than debugging single-thread code.

Deadlocks and livelocks can happen in multi-process code just like with
threads.  Debugging threaded code hasn't been too bad a problem for me
so far.  The basic tactic is use the logging module a lot (it is thread
safe), log the i/o events coming into the system if you observe
misbehaviour, and play the log back through your test harness to
reproduce the problem and debug it by normal means.  You have to do
something like that anyway, if the indeterminacy in the system comes
from the unpredictable ordering of external i/o events.  IMHO you'd get
basicaly similar bugs with other concurrency mechanisms.

Other programs like Postgres are written in dangerous languages like C
and use millions of LOC and vast numbers of fine-grained locks and are
still considered very reliable.  Python threads communicating through
queues has to be a breeze by comparison.  Compared to what Linux kernel
hackers have to deal with, it's not even on the same planet.  This book
is amazing:

https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html

> No. You [generic you] sux, for not investigating whether multiprocessing 
> will do the job. It even has the same public API as threads.

I will give multiprocessing a try next time the matter comes up (it
hasn't always been available) but it doesn't seem as flexible.  E.g. say
I have an employee handling thread (untested pseudocode, forgive
errors):

   def employee_loop(queue):
     while True:
       func, args = queue.get()
       func(*args)

   def adjust_salary(person, update_func):
       salaries[person] = update_func(salaries[person])

   threading.Thread(target=employee_loop,
                    args=[employee_request_queue]).start()

Now in another thread, I want to give you a 50% raise:

    employee_request_queue.put(
      (adjust_salary, ('Steven', 
                         lambda old_salary: old_salary * 1.5)))

Can I do that with multiprocessing without a bunch more boilerplate?

> Even in GIL-less Python, two threads aren't twice as fast as one
> thread. So there comes a point of diminishing returns

Well if two threads are 1.5x as fast as one thread, that's a win.

> Python does have the same capabilities. Jython and IronPython aren't 
> different languages, they are Python.

Oh ok.  I think of Jython as a Java thing and IronPython as a Windows
thing and I don't want to deal with the monstrous JVM or CLR systems.
So I unconsciously didn't think of them as Python.  I do think of PyPy
as Python.

> If you want *CPython* to work without a GIL, well, are you
> volunteering to do the work? It is a massive job, and the core devs
> aren't terribly interested.

It's not a massive job, it's been done and it worked, the problem is
that locking all the CPython refcount updates slowed down the important
single-cpu case so it was never adopted.  So the real massive job would
be moving to a tracing GC and changing every extension module ever
written.  But, that's what PyPy is (among other things) so I'm waiting
to hear from Laura why PyPy has a GIL.



More information about the Python-list mailing list