[Tutor] running multiple concurrent processes

Dave Angel d at davea.name
Tue Oct 30 21:43:54 CET 2012


On 10/30/2012 03:18 PM, richard kappler wrote:
> As I sit through the aftermath of Sandy, I have resumed my personal quest
> to learn python. One of the things I am struggling with is running multiple
> processes. I read the docs on threading and am completely lost so am
> turning to the most excellent tutors here (and thanks for all the help,
> past, present and future btw!).
>
> In what ways can one run multiple concurrent processes and which would be
> considered the "best" way or is that case dependent?
>
> Example:
>
> I'm working on programming a robot in Python. The bot has an Arduino board
> that receives sensor input and sends the data to a laptop which is the
> "brain" of the bot via pySerial and uses this incoming sensor data to help
> determine state. State is used in decision making. The laptop runs a
> program that we'll call the Master Control Program in a nod to Tron. The
> bot also has a chat program, computer vision, some AI it uses to mine the
> web for information, several other functions. Each of these  concurrent
> programs (thus far all python programs) must run continuously and feed data
> to the MCP which receives the data, makes decisions and sends out
> appropriate action commands such as for movement, change of state,
> conversation direction, what to research, etc.
>
> So, to return to the original question, how does one run multiple
> concurrent processes within python?
>
>

I'm only guessing about your background, so please don't take offense at
the simple level of the following.  You see, before you can really
understand how the language features work, and what the various terms
mean, you need to understand the processor and the OS.

A decade or so ago, things were a bit simpler -- if we wanted a faster
machine, Intel would crank up the processor clock rate, and things were
faster.  But eventually, it reached the point where increased clock rate
became VERY expensive, and Intel (and others) came up with a different
strategy.

I'm going to guess you're running on some variant of the Pentium
processor.  The processor (cpu) has a feature called hyperthreading,
meaning that for most operations, it can do two things at once.  So it
has two copies of the instruction pointer, and two copies of most
registers.  As long as neither program uses the features that aren't
replicated, you can run two programs completely independently.  The two
programs share physical memory, hard disk, keyboard and screen, but they
probably won't slow each other down very much.

You may have a dual-core, or even a quad-core processor.  And you may
have more than one of those, if you're on a high-end server.  So, as
long as the processes are separate, you could run many of them at a time.

The other thing that affects all of this is the operating system you're
running.  It has to manage these multiple processes, and make sure that
things that can't be shared are correctly serialized;  one task grabs a
resource   and others block waiting for that resource.  The most visible
(but not the most important) way this occurs is that separate
applications draw in different windows.  They share the screen, but none
of them writes to the raw device, all of them go through a window manager.

This is multiprocessing.  And since one program can launch others, it's
one way that a single "task" can be split up to use these multiple
cores/cpus.  The operating system deliberately keeps the separate
processes very isolated, but provides a few ways for them to talk to
each other:  (one program can launch another, passing it arguments, and
observing the return code, it can also use pipes to connect to stdin and
stdout of the other program, they can open up queues, shared memory, or
they can each read & write to a common file.)  Such processes do NOT
normally share variables, and function calls on one do not easily end up
invoking code in the other.

But there is a second way that two cpus can work on the same "task."  If
a single process is multi-THREADED, then the threads do share variables
and other resources, and communication between them is easy (so easy
it's difficult to get right, actually).  This is theoretically much
gentler on system resources, but at the cost of lots more bugs likely.

Some operating systems have a feature called forking, which can
theoretically give you the best of both worlds.  But I'm not going to
even try to explain that unless you tell me you're on a Linux or Unix
type operating system.  Besides, I don't know how any of Python uses
such a fork;  it hasn't turned out to be necessary information for me yet.

Now, with CPython in particular, multithreading has a serious problem,
the global lock (GIL).  Since so much happens behind the scenes inside
the interpreter and low-level library routines, and perhaps since most
of that was written before multithreading was supported, there's a
single lock that permits only one thread of a process to be working at a
time.  So if you break up a CPU-bound task into multiple threads, only
one will run at a time, and chances are it'll run slower than if it only
had one thread.

Two things happen to make the GIL less painful (it's really just two
manifestations of the same thing).  Many times when a thread is in C
code, or when it is calling some system function that blocks (eg.
waiting for a network message), the GIL is deliberately released, and
other threads CAN run.  So writing a server that waits on many sockets,
one per thread, can make good sense, both from code simplicity and from
performance considerations.

One other thing that's related.  Most gui programs run with an event
loop, which is another type of multithreading that does NOT use any
special cpu or OS features.  With an event loop, it's your job to make
sure all transactions are reasonably small, and that each is triggered
by some event.  Once you understand event loops, it's simpler than
either of the other approaches.   Note that sometimes two or three of
these approaches are combined in one system.

Hope this helps, and that some of it was useful.  I know that in places
I oversimplified, but I think I caught the spirit of the tradeoffs.


-- 

DaveA



More information about the Tutor mailing list