Fastest first

Avi Gross avigross at verizon.net
Sun Dec 16 19:00:40 EST 2018


I have a serious question about what is already available out there in a
herpetologists dream  pythonland.

 

SHORT VERSION: a way to automatically run multiple algorithms in parallel
and kill the rest when one returns an answer.

 

I know I can write this by hand. But it looks like the kind of thing that
would come in handy.

 

MOTIVATION: I recently was discussing alternate GENERAL  ways to solve a
puzzle. Depending on the nature of the input, it seems likely that some
methods will run faster while for other scenarios another might win. It may
not be possible to determine easily which to use so I wonder why not use ALL
of them but stop when an answer arrives.

 

I know how to use all kinds of multitasking in python ranging from threads
running in the same process to multiple external processes that can
communicate with the parent or each other to processes running on different
machines over the internet. You can obviously start algorithm A then B and
as many more as you want in each case. When a (final) solution is reached,
you can send it back to the consolidator (such as a parent) and it can
presumably easily kill local threads by thread number or processes by
process ID. Remote executions might be handled by sending suicide messages
through sockets.

 

Since some algorithms may take orders of magnitude longer than others and
may not ever terminate, letting them continue might be wasteful. But even if
they all share the same CPU, the winner might be slowed down relative to
running alone but perhaps not by much. If they share the same process, the
slowdown for N running together might be I the neighborhood of taking N
times as long. It may be less or more.  If run as another independent
process on a machine which has many other processes running, it may degrade
performance by only a few percent. When running on other machines, no real
impact except the overhead of communications.

 

I am pretty sure there are already many variations out there. When google
searches what has to be a gigantic partially indexed mass of data, I think
it farms out bits of the work to multiple processes and probably processors
as no ONE of them can sift through much of that data in real time. Much of
the search may be done in computer memories. So a search may spawn a massive
attempt and results that come back are consolidated and displayed. Perhaps
after a second or two, no additional data is wanted. It will happily page
the first billion pages it found.

 

What would be nice is to load a module, give it a list of functions to call
and ask to be given an answer with a guarantee that no functions (or
processes)  are still in the running. In real life, this could be complex if
some of your functions invoked a similar method to farm out their work. You
would be killing trees.

 

An example is what happens if you ask for a translation from text written in
an unknown language (I mean Hungarian versus English versus French as an
example.) One solution is to try a list of translators in sequence and if
they return nonsense, try the next. But if you tried them in parallel and
one came back with a 98% probability that it recognized the language because
it found places with a “zs” combination and found characters like an o with
an umlaut and another o with single/double accents, and regognized a few
words like “születésnapot” you can pretty much rule out English and French.
You might have a similar approach with multiple variations on SPAM detectors
and call a halt when any one says it seems safe, or vice versa.

 

I would call this a KILL the stragglers survival of the fittest 
 strategy.

 

 

 

 

 

 




More information about the Python-list mailing list