Multiprocessing.Queue - I want to end.

Roel Schroeven rschroev_nospam_ml at fastmail.fm
Fri May 1 18:04:45 EDT 2009


Hendrik van Rooyen schreef:
> I have always wondered why people do the one queue many getters thing.

Because IMO it's the simplest and most elegant solution.
> 
> Given that the stuff you pass is homogenous in that it will require a
> similar amount of effort to process, is there not a case to be made
> to have as many queues as consumers, and to round robin the work?

Could work if the processing time for each work unit is exactly the same
(otherwise one or more consumers will be idle part of the time), but in
most cases that is not guaranteed. A simple example is fetching data
over the network: even if the data size is always the same, there will
be differences because of network load variations.

If you use one queue, each consumer fetches a new work unit as soon it
has consumed the previous one. All consumers will be working as long as
there is work to do, without having to write any code to do the load
balancing.

With one queue for each consumer, you either have to assume that the
average processing time is the same (otherwise some consumers will be
idle at the end, while others are still busy processing work units), or
you need some clever code in the producer(s) or the driving code to
balance the loads. That's extra complexity for little or no benefit.

I like the simplicity of having one queue: the producer(s) put work
units on the queue with no concern which consumer will process them or
how many consumers there even are; likewise the consumer(s) don't know
and don't need to know where their work units come from. And the work
gets automatically distributed to whichever consumer has first finished
its previous work unit.

> And if the stuff you pass around needs disparate effort to consume,
> it seems to me that you can more easily balance the load by having
> specialised consumers, instead of instances of one humungous 
> "I can eat anything" consumer.

If there is a semantic difference, maybe yes; but I think it makes no
sense to differentiate purely on the expected execution times.

> I also think that having a queue per consumer thread makes it easier
> to replace the threads with processes and the queues with pipes or
> sockets if you need to do serious scaling later.

Perhaps, but isn't that a case of YAGNI and/or premature optimization?


-- 
The saddest aspect of life right now is that science gathers knowledge
faster than society gathers wisdom.
  -- Isaac Asimov

Roel Schroeven



More information about the Python-list mailing list