[Python-Dev] Making python C-API thread safe (try 2)

Tue Sep 16 13:22:41 EDT 2003

Jeff Epler wrote:

> Harri,
> I don't understand how your suggested changes, even once carried out,
> will let me use threads to increase the speed of a CPU-intensive Python
> program.  For instance, consider the following code:
> 
> solutions = []
> 
> def is_solution(l):
>     # CPU-intensive code here.  Let's say it runs for 1 second per call
> 
> def consider_solution(l):
>     if is_solution(l):
>         solutions.append(l)
> 
> def problem_space(l):
>     # A generator of items in the problem space
>     # pretty fast!
>     ...
>     yield l
> 
> def all_solutions():
>     for l in problem_space:
>         consider_solution(l)
> 
> 
> I could thread it, so that N threads each run is_solution on a different
> candidate:
>     def all_solutions():
>         queue = worker_tasks(consider_solution, N)
>         for l in problem_space:
>             queue.add(l)
>         queue.shutdown()
> 
> But with your proposed changes, it sounds like each thread becomes an
> island, with no access to common objects (like the list "solutions" or
> the queue connecting the main thread with the worker threads).
> If threading truly worked, then I'd be able to run efficiently on n*1
> CPUs, where n is the ratio of the speed of one iteration of is_solution
> compared to one iteration of problem_space.
> 
> On the other hand, I can make the above work quickly today by using
> processes and pipes.  I can do this only because I've identified the
> parts that need to be shared (the queue of candidate solutions, and the
> list of confirmed solutions).  I think that's the same level of effort
> required under the "thread is an island" approach you're suggesting, but
> the processes&pipes code will likely be easier to write.

I mostly agree with what you said.

Another approach to shared memory access is to have a special syntax or 
functions that do it. These functions could do internally just what you 
said, use pipes, or whatever. Each thread could have a name, a string, 
and then we could have a couple of simple built-in functions to send 
messages (strings) from thread to thread, peek messages, and wait for 
messages. One message would have a special meaning of Quit, so that the 
thread knows when to stop. Bascially this is all that is needed for 
these independent threads.

Another approach would be the separate shared interpreter state, access 
to which would be synchronized. This is probably much harder to 
implement, but it would be more beautiful, you could have there all 
different objects like in normal non-threaded Python. So if you have two 
threads, you would have three independent interpreter states (one shared).

It is more efficient to have one process and several threads, than 
several processes each having one thread.

Harri