Threading help?

Thu Mar 7 12:32:41 EST 2002

On Wed, 06 Mar 2002 22:54:09 -0500
Peter Hansen wrote:

> Cliff Wells wrote:
> > Peter Hansen wrote:
> > > Cliff Wells wrote:
> > > > Hm.  Okay, I had to reconsider this.  Clearly if the processing is
slower
> > > > than .1s and data is being added to it every .1s, the Queue is
going to
> > > > endlessly grow as more data is added to it.  If this is the case,
it might 
> > > > make sense to have more than one consumer thread (B) doing the
processing.
> > >
> > > I might be missing something, but I don't see how adding another
thread
> > > (with a slight extra overhead associated with it) would actually
increase
> > > performance for what I infer is CPU-bound processing in thread B.
> > 
> > Yes, I thought of this, but I believe that this would give the net
effect
> > of increasing thread priority for the B threads (since they will get
more
> > CPU time as a whole vs threads A and C).
> 
> Ah, interesting.  I hadn't thought before of how the Python interpreter's
> technique of executing a set number of instructions before switching
context
> would let you do something tricky like that.  Neat.
> 
> In any case, when you said "have thread A retrieve the data every .1s"
> I figured this was a soft realtime situation, and stealing CPU from
thread
> A would not necessarily be what you need.  If between A and B you are
taking
> up all the time, you need to optimize the code, or do what you suggested
> below and drop data (or increase the sampling period):

On my way home last night (immediately after posting this, of course), I
realized that this is no good anyway as thread A will be
sleeping/retrieving and thread C will be blocking, waiting for data on the
queue, so having multiple thread B's wouldn't help much.  This is a
difficult problem in Python (assuming the processing time for B is longer
than the time constraints given).  Simply put, it can't be done in a
straightforward fashion.  Basically it comes down to this: data is required
every .1s, but the processing on that data takes >.1s.  I don't believe SMP
comes into play much when using Python (even with threads), so the only
approach that would work would be a distributed model (although this could
be done on a single SMP PC by setting the affinity of the interpreters) or
by dropping data.

> > But maybe a better approach would be this: if processing time is
greater
> > than .1s and we don't care about out-of-date data (big assumption, but
not
> > unreasonable), then use the Queue between A and B and simply have B
empty
> > the Queue on every iteration, processing only the newest data.  This
would
> > keep the Queue to a reasonable size at the cost of dropping data.
> 
> I suppose the acceptability of that depends on your requirements...
> dropping data is never a very general approach. :-)

Not general, but acceptable in cases where single blocks of data are less
important than the flow of data (i.e. streaming video).  This case sounds
like it might fall into that category (I'm guessing it's simply updating a
web page with snapshots of data), so dropping data would be an option.  Of
course, all this speculation only applies if thread B's processing time is
>.1s and that the process is long-running enough that building a large
queue would be an issue.

Regards,

-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308