Threads vs Processes

Thu Jul 27 23:53:54 EDT 2006

It seems that both ways are here to stay. If one was so much inferior
and problem-prone, we won't be talking about it now, it would have been
forgotten on the same shelf with a stack of punch cards.

The rule of thumb is 'the right tool for the right job.'

Threading model is very useful for long CPU-bound processing, as it can
potentially take advantage of multiple CPUs/Cores (alas not in Python
now because of GIL). The events will not work as well here. But note,
if there is not much sharing of resources between threads processes
could be used! It turns out that there are very few cases where threads
are simply indispensable.

The event model is usually well suited for I/O or for any large number
of shared resources occurs that would require lots of synchronizations
if threads would be used.

DBMS' are not a good example of typical large, so 'saying see DBMS use
threads -- therefore threads are better' doesn't make a good example.
DBMS are highly optimized, only a few of them actually manage to
successfully take advantage of the multiple execution units. One could
as well cite a hundred of other projects and say 'see it uses an event
model -- therefore event models are better' and so on. Again "right
tool for the right job". A good programmer should know both...

> And consequently, to use Twisted you rewrite all your code as
> those 'deferred' things.
>
Then, try re-writing Twisted using threads in the same number of lines
having the same or better performance.  I bet you'll end up having a
whole bunch of 'locks', 'waits' and 'notify's instead of a bunch of
"those 'deferred' things." Debugging all those threads should be a
project in an of itself.

-Nick

bryanjugglercryptographer at yahoo.com wrote:
> mark wrote:
> > The debate should not be about "threads vs processes", it should be
> > about "threads vs events".
>
> We are so lucky as to have both debates.
>
> > Dr. John Ousterhout (creator of Tcl,
> > Professor of Comp Sci at UC Berkeley, etc), started a famous debate
> > about this 10 years ago with the following simple presentation.
> >
> > http://home.pacbell.net/ouster/threads.pdf
>
> The Ousterhout school finds multiple lines of execution
> unmanageable, while the Tannenbaum school finds asynchronous I/O
> unmanageable.
>
> What's so hard about single-line-of-control (SLOC) event-driven
> programming? You can't call anything that might block. You have to
> initiate the operation, store all the state you'll need in order
> to pick up where you left off, then return all the way back to the
> event dispatcher.
>
> > That sentiment has largely been ignored and thread usage dominates but,
> > if you have been programming for as long as I have, and have used both
> > thread based architectures AND event/reactor/callback based
> > architectures, then that simple presentation above should ring very
> > true. Problem is, young people merely equate newer == better.
>
> Newer? They're both old as the trees. That can't be why the whiz
> kids like them. Threads and process rule because of their success.
>
> > On large systems and over time, thread based architectures often tend
> > towards chaos.
>
> While large SLOC event-driven systems surely tend to chaos. Why?
> Because they *must* be structured around where blocking operations
> can happen, and that is not the structure anyone would choose for
> clarity, maintainability and general chaos avoidance.
>
> Even the simplest of modular structures, the procedure, gets
> broken. Whether you can encapsulate a sequence of operations in a
> procedure depends upon whether it might need to do an operation
> that could block.
>
> Going farther, consider writing a class supporting overriding of
> some method. Easy; we Pythoneers do it all the time; that's what
> O.O. inheritance is all about. Now what if the subclass's version
> of the method needs to look up external data, and thus might
> block? How does a method override arrange for the call chain to
> return all the way back to the event loop, and to and pick up
> again with the same call chain when the I/O comes in?
>
> > I have seen a few thread based systems where the
> > programmers become so frustrated with subtle timing issues etc, and they
> > eventually overlay so many mutexes etc, that the implementation becomes
> > single threaded in practice anyhow(!), and very inefficient.
>
> While we simply do not see systems as complex as modern DBMS's
> written in the SLOC event-driven style.
>
> > BTW, I am fairly new to python but I have seen that the python Twisted
> > framework is a good example of the event/reactor design alternative to
> > threads. See
> >
> > http://twistedmatrix.com/projects/core/documentation/howto/async.html .
>
> And consequently, to use Twisted you rewrite all your code as
> those 'deferred' things.
> 
> 
> -- 
> --Bryan