Multiprocessing.Queue - I want to end.

Dave Angel davea at ieee.org
Sat May 2 16:46:16 EDT 2009


Hendrik van Rooyen wrote:
> : "Roel Schroeven" <rschroev_nospam_ml at fastmail.fm> wrote:
>
>
>  
>> Hendrik van Rooyen schreef:
>>    
>>> I have always wondered why people do the one queue many getters thing.
>>>       
>> Because IMO it's the simplest and most elegant solution.
>>     
>
> That is fair enough...
>
>  
>>> Given that the stuff you pass is homogenous in that it will require a
>>> similar amount of effort to process, is there not a case to be made
>>> to have as many queues as consumers, and to round robin the work?
>>>       
>> Could work if the processing time for each work unit is exactly the same
>> (otherwise one or more consumers will be idle part of the time), but in
>> most cases that is not guaranteed. A simple example is fetching data
>> over the network: even if the data size is always the same, there will
>> be differences because of network load variations.
>>
>> If you use one queue, each consumer fetches a new work unit as soon it
>> has consumed the previous one. All consumers will be working as long as
>> there is work to do, without having to write any code to do the load
>> balancing.
>>
>> With one queue for each consumer, you either have to assume that the
>> average processing time is the same (otherwise some consumers will be
>> idle at the end, while others are still busy processing work units), or
>> you need some clever code in the producer(s) or the driving code to
>> balance the loads. That's extra complexity for little or no benefit.
>>
>> I like the simplicity of having one queue: the producer(s) put work
>> units on the queue with no concern which consumer will process them or
>> how many consumers there even are; likewise the consumer(s) don't know
>> and don't need to know where their work units come from. And the work
>> gets automatically distributed to whichever consumer has first finished
>> its previous work unit.
>>     
>
> This is all true in the case of a job that starts, runs and finishes.
> I am not so sure it applies to something that has a long life.
>
>  
>>> And if the stuff you pass around needs disparate effort to consume,
>>> it seems to me that you can more easily balance the load by having
>>> specialised consumers, instead of instances of one humungous "I can 
>>> eat anything" consumer.
>>>       
>> If there is a semantic difference, maybe yes; but I think it makes no
>> sense to differentiate purely on the expected execution times.
>>     
>
> The idea is basically that you have the code that classifies in one
> place only, instead of running in all the instances of the consumer.
> Feels better to me, somehow.
>   <snip>
If the classifying you're doing is just based on expected time to 
consume the item, then I think your plan to use separate queues is 
misguided.

If the consumers are interchangeable in their abilities, then feeding 
them from a single queue is more efficient, both on average wait time, 
worst-case wait time, and on consumer utilization, in nearly all 
non-pathological scenarios.  Think the line at the bank.  There's a good 
reason they now have a single line for multiple tellers.  If you have 
five tellers, and one of your transactions is really slow, the rest of 
the line is only slowed down by 20%, rather than a few people being 
slowed down by a substantial amount because they happen to be behind the 
slowpoke.  30 years ago, they'd have individual lines, and I tried in 
vain to explain queuing theory to the bank manager.

Having said that, notice that sometimes the consumers in a computer are 
not independent.  If you're running 20 threads with this model, on a 
processor with only two cores, and if the tasks are CPU bound, you're 
wasting lots of thread-management time without gaining anything.  
Similarly, if there are other shared resources that all the threads trip 
over, it may not pay to do as many of them in parallel.  But for 
homogeneous consumers, do them with a single queue, and do benchmarks to 
determine the optimum number of consumers to start.




More information about the Python-list mailing list