Multiprocessing.Queue - I want to end.

Sat May 2 06:38:01 EDT 2009

: "Roel Schroeven" <rschroev_nospam_ml at fastmail.fm> wrote:

> Hendrik van Rooyen schreef:
> > I have always wondered why people do the one queue many getters thing.
> 
> Because IMO it's the simplest and most elegant solution.

That is fair enough...

> > 
> > Given that the stuff you pass is homogenous in that it will require a
> > similar amount of effort to process, is there not a case to be made
> > to have as many queues as consumers, and to round robin the work?
> 
> Could work if the processing time for each work unit is exactly the same
> (otherwise one or more consumers will be idle part of the time), but in
> most cases that is not guaranteed. A simple example is fetching data
> over the network: even if the data size is always the same, there will
> be differences because of network load variations.
> 
> If you use one queue, each consumer fetches a new work unit as soon it
> has consumed the previous one. All consumers will be working as long as
> there is work to do, without having to write any code to do the load
> balancing.
> 
> With one queue for each consumer, you either have to assume that the
> average processing time is the same (otherwise some consumers will be
> idle at the end, while others are still busy processing work units), or
> you need some clever code in the producer(s) or the driving code to
> balance the loads. That's extra complexity for little or no benefit.
> 
> I like the simplicity of having one queue: the producer(s) put work
> units on the queue with no concern which consumer will process them or
> how many consumers there even are; likewise the consumer(s) don't know
> and don't need to know where their work units come from. And the work
> gets automatically distributed to whichever consumer has first finished
> its previous work unit.

This is all true in the case of a job that starts, runs and finishes.
I am not so sure it applies to something that has a long life.

> 
> > And if the stuff you pass around needs disparate effort to consume,
> > it seems to me that you can more easily balance the load by having
> > specialised consumers, instead of instances of one humungous 
> > "I can eat anything" consumer.
> 
> If there is a semantic difference, maybe yes; but I think it makes no
> sense to differentiate purely on the expected execution times.

The idea is basically that you have the code that classifies in one
place only, instead of running in all the instances of the consumer.
Feels better to me, somehow.

> 
> > I also think that having a queue per consumer thread makes it easier
> > to replace the threads with processes and the queues with pipes or
> > sockets if you need to do serious scaling later.
> 
> Perhaps, but isn't that a case of YAGNI and/or premature optimization?

Yes and no:

Yes - You Are Gonna Need It.
and 
No it is never premature to use a decent structure.

:-)

- Hendrik