More CPUs doen't equal more speed

Thu May 23 20:46:05 EDT 2019

On 2019-05-24 01:22, Chris Angelico wrote:
> On Fri, May 24, 2019 at 10:07 AM Bob van der Poel <bob at mellowood.ca> wrote:
>>
>> Thanks all! The sound you are hearing is my head smacking against my hand!
>> Or is it my hand against my head?
>>
>> Anyway, yes the problem is that I was naively using command.getoutput()
>> which blocks until the command is finished. So, of course, only one process
>> was being run at one time! Bad me!
>>
>> I guess I should be looking at subprocess.Popen(). Now, a more relevant
>> question ... if I do it this way I then need to poll though a list of saved
>> process IDs to see which have finished? Right? My initial thought is to
>> batch them up in small groups (say CPU_COUNT-1) and wait for that batch to
>> finish, etc. Would it be foolish to send send a large number (1200 in this
>> case since this is the number of files) and let the OS worry about
>> scheduling and have my program poll 1200 IDs?
> 
> That might create a lot of contention, resulting in poor performance.
> But depending on what your tasks saturate on, that might not matter
> all that much, and it _would_ be a simple and straight-forward
> technique. In fact, you could basically just write your code like
> this:
> 
> for job in jobs:
>      start_process()
> for process in processes:
>      wait_for_process()
> 
> Once they're all started, you just wait for the first one to finish.
> Then when that's finished, wait for the next, and the next, and the
> next. If the first process started is actually the slowest to run, all
> the others will be in the "done" state for a while, but that's not a
> big deal.
> 
>> Someone mentioned the GIL. If I launch separate processes then I don't
>> encounter this issue? Right?
> 
> The GIL is basically irrelevant here. Most of the work is being done
> in subprocesses, so your code is spending all its time waiting.
> 
> What I'd recommend is a thread pool. Broadly speaking, it would look
> something like this:
> 
> jobs = [...]
> 
> def run_jobs():
>      while jobs:
>          try: job = jobs.pop()
>          except IndexError: break # deal with race
>          start_subprocess()
>          wait_for_subprocess()
> 
> threads = [threading.Thread(target=run_jobs).start()
>      for _ in THREAD_COUNT]
> for thread in threads:
>      thread.join()
> 
> Note that this has the same "start them all, then wait on them in
> order" model. In this case, though, there won't be 1200 threads -
> there'll be THREAD_COUNT of them (which may not be the same as your
> CPU count, but you could use that same figure as an initial estimate).
> 
> Within each thread, the logic is also quite simple: take a job, do the
> job, repeat till you run out of jobs. The GIL ensures that "job =
> jobs.pop()" is a safe atomic operation that can't possibly corrupt
> internal state, and will always retrieve a unique job every time. The
> run_jobs function simply runs one job at a time, waiting for its
> completion.
> 
> This kind of pattern keeps everything clean and simple, and is easy to
> tweak for performance.
> 
Personally, I'd use a queue (from the 'queue' module), instead of a 
list, for the job pool.