Multiprocessing.Queue - I want to end.

Luis Alberto Zarrabeitia Gomez kyrie at uh.cu
Mon May 4 00:56:26 EDT 2009


Quoting Hendrik van Rooyen <mail at microcorp.co.za>:

>  "Luis Zarrabeitia" <akakyrie at uh.cu> wrote:
> 
> 8< -------explanation and example of one producer, --------
> 8< -------more consumers and one queue --------------------
> 
> >As you can see, I'm sending one 'None' per consumer, and hoping that no 
> >consumer will read more than one None. While this particular implementation
> 
> 
> You don't have to hope. You can write the consumers that way to guarantee
> it.

I did. But that solution is not very reusable (I _must_ remember that
implementation detail every time) and most important, i'll have to remember it
in a few months with I'm updating the code.


> >ensures that, it is very fragile. Is there any way to signal the consumers?
>  
> Signalling is not easy - you can signal a process, but I doubt if it is 
> possible to signal a thread in a process.
> 
> >(or better yet, the queue itself, as it is shared by all consumers?) 
> >Should "close" work for this? (raise the exception when the queue is 
> >exhausted, not when it is closed by the producer).
> 
> I haven't the foggiest if this will work, and it seems to me to be kind
> of involved compared to passing a sentinel or sentinels.

Well, that would be a vaild signal. Too bad I have to pass it by hand, instead
of the queue class taking care of it for me.

> I have always wondered why people do the one queue many getters thing.
> 
> Given that the stuff you pass is homogenous in that it will require a
> similar amount of effort to process, is there not a case to be made
> to have as many queues as consumers, and to round robin the work?

Abstraction. This problem is modeled nicely as a producer-consumer (it would be
in fact a classic producer-consumer). I could take care of the scheduling
myself, but there are already good scheduling algorithms written for my OS, that
take into account both the available CPU and IO.

A solution may not be a queue (in my case, I don't care the order in which the
elements are processed, only that they are), but ideally I would just be
'yielding' results on my producer(s), and receiving them on my consumer(s),
leaving the IPC mechanism to deal with how to move the data from producers to
consumers (and to which consumers).

> And if the stuff you pass around needs disparate effort to consume,
> it seems to me that you can more easily balance the load by having
> specialised consumers, instead of instances of one humungous 
> "I can eat anything" consumer.

Not necessarily. The load may depend on the size of the data that was sent. The
consumers are receiving the same kind of data, only the sizes are different
(non-predictable different). Again, I could try to implement some heuristics to
try and guess what processor has lower load, but I'd rather delegate that to the OS.

> I also think that having a queue per consumer thread makes it easier
> to replace the threads with processes and the queues with pipes or
> sockets if you need to do serious scaling later.

This is already multiprocess. It could be nice to extend it to multi-computers
later, but the complexity is not worth it right now.

> In fact I happen to believe that anything that does any work needs 
> one and only one input queue and nothing else, but I am peculiar
> that way.

Well, I also need some output. In my case, the outputs are files with the result
of the processing, that can be summarized later (hence the need to 'join' the
processes, to know when I can summarize them).

Thank you.

-- 
Luis Zarrabeitia
Facultad de Matemática y Computación, UH
http://profesores.matcom.uh.cu/~kyrie

-- 
Participe en Universidad 2010, del 8 al 12 de febrero de 2010
La Habana, Cuba 
http://www.universidad2010.cu




More information about the Python-list mailing list