Bi-directional sub-process communication

Israel Brewster israel at ravnalaska.net
Mon Nov 23 12:54:38 EST 2015


I have a multi-threaded python app (CherryPy WebApp to be exact) that launches a child process that it then needs to communicate with bi-driectionally. To implement this, I have used a pair of Queues: a child_queue which I use for master->child communication, and a master_queue which is used for child->master communication.

The way I have the system set up, the child queue runs a loop in a tread that waits for messages on child_queue, and when received responds appropriately depending on the message received, which sometimes involves posting a message to master_queue.

On the master side, when it needs to communicate with the child process, it posts a message to child_queue, and if the request requires a response it will then immediately start waiting for a message on master_queue, typically with a timeout.

While this process works well in testing, I do have one concern (maybe unfounded) and a real-world issue

Concern: Since the master process is multi-threaded, it seems likely enough that multiple threads on the master side would make requests at the same time. I understand that the Queue class has locks that make this fine (one thread will complete posting the message before the next is allowed to start), and since the child process only has a single thread processing messages from the queue, it should process them in order and post the responses (if any) to the master_queue in order. But now I have multiple master processes all trying to read master_queue at the same time. Again, the locks will take care of this and prevent any overlapping reads, but am I guaranteed that the threads will obtain the lock and therefore read the responses in the right order? Or is there a possibility that, say, thread three will get the response that should have been for thread one? Is this something I need to take into consideration, and if so, how?

Real-world problem: While as I said this system worked well in testing, Now that I have gotten it out into production I've occasionally run into a problem where the master thread waiting for a response on master_queue times out while waiting. This causes a (potentially) two-fold problem, in that first off the master process doesn't get the information it had requested, and secondly that I *could* end up with an "orphaned" message on the queue that could cause problems the next time I try to read something from it.

I currently have the timeout set to 3 seconds. I can, of course, increase that, but that could lead to a bad user experience - and might not even help the situation if something else is going on. The actual exchange is quite simple:

On the master side, I have this code:

config.socket_queue.put('GET_PORT')
try:
    port = config.master_queue.get(timeout=3)  #wait up to three seconds for a response
except Empty:
     port = 5000  # default. Can't hurt to try.

Which, as you might have been able to guess, tries to ask the child process (an instance of a tornado server, btw) what port it is listening on. The child process then, on getting this message from the queue, runs the following code:

elif item == 'GET_PORT':
      port = utils.config.getint('global', 'tornado.port')
      master_queue.put(port)

So nothing that should take any significant time. Of course, since this is a single thread handling any number of requests, it is possible that the thread is tied up responding to a different request (or that the GIL is preventing the thread from running at all, since another thread might be commandeering the processor), but I find it hard to believe that it could be tied up for more than three seconds.

So is there a better way to do sub-process bi-directional communication that would avoid these issues? Or do I just need to increase the timeout (or remove it altogether, at the risk of potentially causing the thread to hang if no message is posted)? And is my concern justified, or just paranoid? Thanks for any information that can be provided!

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------







More information about the Python-list mailing list