moving to processing from threading, global variables

Dave Angel davea at ieee.org
Sun Apr 12 16:13:21 EDT 2009


rkmr.em at gmail.com wrote:
> no one? how can i share variables in processing?
> please help out!
>
> On Thu, Apr 9, 2009 at 8:05 AM, rkmr.em at gmail.com <rkmr.em at gmail.com> wrote:
>
>   
>> hi
>> i am trying to move from threading to processing package. this is the
>> controller i used to spawn new threads, and it used the global
>> variable done to check if it needs to spawn more threads or not. it worked
>> great for me. it checks if there is new data to be processed every
>> 30 seconds, and spawns of new threads to do the work, till all work is
>> done.
>>
>>
>> but now in processing, anychange made to global variable done in sendalert
>> method, is not reflected in the controller method.
>>
>> can please point out what is wrong?
>> thanks a lot!
>>
>>
>> code
>>
>>
>> done = False
>> def sendalert():
>>    global done
>>    users = q.get('xx')
>>    if not users:
>>      done = True
>>      return
>>
>>    done = False
>>    for u in users:
>>     do stuff
>>
>>
>> def control(number_threads_min=3, number_threads_max = 100):
>>     global done
>>     while True:
>>         number_threads = len(processing.activeChildren())
>>         if not done and number_threads<number_threads_max:
>>             processing.Process(target=sendalert).start()
>>         if done and number_threads<number_threads_min:
>>             processing.Process(target=sendalert).start()
>>             time.sleep(30)
>>
>> if __name__ == '__main__':
>>     processing.Process(target=control).start()
>>
>>     
Some of the discussion below varies between operating systems, 
especially historically.  For example., 16 bit Windows (say version 3.1 
for example) didn't have separate address spaces or threads.  And 
certainly some of the capabilities vary by programming language.

There are several different models of two routines running 
"simultaneously" :
   1) co-routines - closest thing that Python has is generators, where 
the generator runs for a while, then yields up some time to its caller.  
Co-routines share everything, but they can be a pain to use in the 
general case.
   2) threads - these run in the same process (and address space), and 
share variables, file handles, etc.  The only real distinction between 
the threads is scheduling.
   3) processes - these run in separate address spaces.  By default 
nothing is shared between them.  But there are OS functions that allow 
them to share pretty intimately.
   4) computers - these run on separate computers.  They communicate 
only by I/O operations, such as pipes, sockets, etc.

There are reasons in many environments to switch from threads to 
processes.  Most common is that something you need done is already 
encapsulated in an executable, so you spawn that as a separte process.

Another reason is that some library the two tasks use is not threadsafe, 
so you don't dare have both operations in the same process.  Simplest 
example of that is GUI programming, where most of the GUI code must run 
in a single thread, with a single event loop.

But now to your question.  When you have two separate processes, you can 
share by various means the OS provides.  Simplest is to use the command 
line to pass parameters to the child process, and the return code to 
pass a  (int) value back.  Next easiest is to simply write data to a 
file that you can both see.  There are ways to lock the file so you're 
not both updating it at the same time.  Next after that is a pipe (or 
socket, which is somewhat more general).  And finally shared memory.  
Shared memory is the easiest to conceptualize.  Basically you both map a 
hunk of shared memory and take turns reading and writing it.  But as 
with all the others, you have to be careful about concurrent or 
overlapping updates.  For example, if process A is changing string (not 
Python string class, but a similar abstraction) from "Howard" to 
"Gloria".  You don't want process B to fetch the value "Hooria" by mistake.

Generally, you want to do the simplest sharing that can solve the need.  
In your case, if 'done' was the only global,  I'd recommend using a 
presence or absence of a file (s) as your communications method.  
Process A writes it or deletes it, and Process B checks for its 
existence, and maybe its timestamp.  As for the filename, you can pass 
that on the command line when you start the second process.





More information about the Python-list mailing list