More CPUs doen't equal more speed

Thu May 23 20:22:40 EDT 2019

On Fri, May 24, 2019 at 10:07 AM Bob van der Poel <bob at mellowood.ca> wrote:
>
> Thanks all! The sound you are hearing is my head smacking against my hand!
> Or is it my hand against my head?
>
> Anyway, yes the problem is that I was naively using command.getoutput()
> which blocks until the command is finished. So, of course, only one process
> was being run at one time! Bad me!
>
> I guess I should be looking at subprocess.Popen(). Now, a more relevant
> question ... if I do it this way I then need to poll though a list of saved
> process IDs to see which have finished? Right? My initial thought is to
> batch them up in small groups (say CPU_COUNT-1) and wait for that batch to
> finish, etc. Would it be foolish to send send a large number (1200 in this
> case since this is the number of files) and let the OS worry about
> scheduling and have my program poll 1200 IDs?

That might create a lot of contention, resulting in poor performance.
But depending on what your tasks saturate on, that might not matter
all that much, and it _would_ be a simple and straight-forward
technique. In fact, you could basically just write your code like
this:

for job in jobs:
    start_process()
for process in processes:
    wait_for_process()

Once they're all started, you just wait for the first one to finish.
Then when that's finished, wait for the next, and the next, and the
next. If the first process started is actually the slowest to run, all
the others will be in the "done" state for a while, but that's not a
big deal.

> Someone mentioned the GIL. If I launch separate processes then I don't
> encounter this issue? Right?

The GIL is basically irrelevant here. Most of the work is being done
in subprocesses, so your code is spending all its time waiting.

What I'd recommend is a thread pool. Broadly speaking, it would look
something like this:

jobs = [...]

def run_jobs():
    while jobs:
        try: job = jobs.pop()
        except IndexError: break # deal with race
        start_subprocess()
        wait_for_subprocess()

threads = [threading.Thread(target=run_jobs).start()
    for _ in THREAD_COUNT]
for thread in threads:
    thread.join()

Note that this has the same "start them all, then wait on them in
order" model. In this case, though, there won't be 1200 threads -
there'll be THREAD_COUNT of them (which may not be the same as your
CPU count, but you could use that same figure as an initial estimate).

Within each thread, the logic is also quite simple: take a job, do the
job, repeat till you run out of jobs. The GIL ensures that "job =
jobs.pop()" is a safe atomic operation that can't possibly corrupt
internal state, and will always retrieve a unique job every time. The
run_jobs function simply runs one job at a time, waiting for its
completion.

This kind of pattern keeps everything clean and simple, and is easy to
tweak for performance.

ChrisA