Fastest first

Mon Dec 24 16:29:51 EST 2018

Stefan,

Yes, techniques like the one you mention are some of the ways I am aware of but they may not be the easiest way to automate.

I was thinking of a style of solving problems. I note python (on various platforms) has an amazing number of ways to do things in parallel and quite a few methods those tasks can communicate with the parent(s) or each other.

My goal was not so much to kill processes but to make a killing.

OK, maybe not. Just a pun.

The obvious goal is to have a framework where you have multiple parties trying to solve a problem using divergent methods with no obvious way in advance to figure out which would be first with a good answer. Some of the algorithms used might be heuristic and you might not want the first one that finishes but rather the first one that returns a value say within 2% of the best possible result. An example would be designing an airplane that comes within 2% of the results of one with negligible friction flying in fairly empty space. If the best current plane only is within 20% of that then any algorithm getting within 10% might be good enough.

And some algorithms may not reliably terminate on the data supplied. Eventually they need to be stopped even if no other solution comes in.

Your message seems to be talking about threads running within the same process as the parent and being time switched. If writing code for that environment, I am aware of many ways of building the module as described. Yes, you want the threads to regularly yield back control for many reasons including some dominating at the expense of others. So if they are regularly pausing and yielding, you can set it up so they get a message and perform carefully controlled apoptosis. They might carefully clean up what is needed, such as closing open files, release any locks, delete any objects exclusively used by them, perhaps scribble away some data, and exit. Sure, the parent could just kill them but that can leave a mess.

And your other suggestions also make sense in some scenarios. Yes, a thread that seems to be making slow progress might even suspend itself for a while and check to see if others have made more progress.

Let me be very concrete here. I am thinking of an example such as wanting to calculate pi to some enormous number of decimal places. Don't ask why or it will spoil the mood.

There are a stupendous number of ways to calculate pi. Many involve summing the results of an infinite series. Some do very little on each iteration and some do very complex things. Most of them can be viewed as gradually having more and more digits trailing off to the "right" that no longer change with each iteration while the ones further to the right are clearly not significant yet since they keep changing. So an algorithm might stop every hundred iterations and calculate how many digits seem quite firm since the last check. One algorithm might report to a central bulleting board that they are getting five more digits and another might report 0 and yet another might report 352. It might make sense for only the fastest ones to continue at full pace. But if an algorithm later slows down, perhaps another one should be revived. A quick search online shows lots of ways to do such a set of analyses but note many do not return pi directly but something like pi/4 or 1/pi. There is even a formula that calculates the nth digit of pi without calculating any that came before!

Clearly you might be able to start hundreds of such methods at once. It may be better not to have them all within one process. Since none of them would terminate before reaching say the billionth digit, and even the fastest may run for a very long time, you can imagine an algorithm where each one checks their progress periodically and the slowest one quite every 10 minutes till only the fastest survive and perhaps are now running faster.

But none of this applies to my original question. I am talking about whether anyone has created a general framework for a much simpler scenario. Loosely speaking, I might be asked to create a list of objects or functions that somehow can be started independently in the same way. I hand this over to some module that takes my list and does the rest and eventually returns results. That module might accept additional choices and instructions that fine tune things. But the overall code might look like:

Import some module
Make a new object that will run things for me as some kind of controller.
Invoke incantations on that controller that include setting things and especially getting it the list of things to do.
I ask it to get started and either wait or go do something else.
At some later point I find it has completed. The results of the problem are somewhere I can reach. Perhaps there are logs I can search to see more details like which methods quit or were stopped or what their last result was before ...

Please see the above as an outline or example, not exact details or designs.

I simply think such a utility may already have been built out there, as well as many variations. Just as an example, someone recently mentioned to me that there is a module with a function called islice() that might do exactly what was wanted. So, of course, I read about everything else in that module: https://docs.python.org/2/library/itertools.html

And I found quite a variety of other functions to create combinations of iterables and ways to combine them to make even more. Others have already considered what might be of use and many are being shared. And many are well debugged so unless I am trying to deliberately figure out how to do something by myself (which I admit I often do) this is a good way to go.

So although I am happy to discuss ways to do this kind of task, I am really asking if this is a common enough way of looking at solving problems that many solutions have already appeared including some that are out there and ready.

Clearly though, some such solutions would be quite specific to particular methods of dividing up tasks such as using processes versus threads or even distributing computation across the cloud. Some might be even more dynamic and allow some of the processes running to throw additional ones into the mix as variants of their methods.

I will stop here even if it means I have to kill myself. 😉

Avi
-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On Behalf Of Stefan Behnel
Sent: Monday, December 24, 2018 9:46 AM
To: python-list at python.org
Subject: Re: Fastest first

Avi Gross schrieb am 17.12.18 um 01:00:
> SHORT VERSION: a way to automatically run multiple algorithms in 
> parallel and kill the rest when one returns an answer.

One (somewhat seasonal) comment on this: it doesn't always have to be about killing (processes or threads). You might also consider a cooperative implementation, where each of the algorithms is allowed to advance by one "step" in each "round", and is simply discarded when a solution is found elsewhere, or when it becomes unlikely that this specific algorithm will contribute a future solution. This could be implemented via a sequence of generators or coroutines in Python. Such an approach is often used in simulations (e.g. SimPy and other "discrete event" simulators), where exact control over the concurrency pattern is desirable.

Stefan

--
https://mail.python.org/mailman/listinfo/python-list