More CPUs doen't equal more speed

Rob Gaddi rgaddi at highlandtechnology.invalid
Fri May 24 12:28:18 EDT 2019


On 5/23/19 6:32 PM, Cameron Simpson wrote:
> On 23May2019 17:04, bvdp <bob at mellowood.ca> wrote:
>> Anyway, yes the problem is that I was naively using command.getoutput()
>> which blocks until the command is finished. So, of course, only one 
>> process
>> was being run at one time! Bad me!
>>
>> I guess I should be looking at subprocess.Popen(). Now, a more relevant
>> question ... if I do it this way I then need to poll though a list of 
>> saved
>> process IDs to see which have finished? Right? My initial thought is to
>> batch them up in small groups (say CPU_COUNT-1) and wait for that 
>> batch to
>> finish, etc. Would it be foolish to send send a large number (1200 in 
>> this
>> case since this is the number of files) and let the OS worry about
>> scheduling and have my program poll 1200 IDs?
>>
>> Someone mentioned the GIL. If I launch separate processes then I don't
>> encounter this issue? Right?
> 
> Yes, but it becomes more painful to manage. If you're issues distinct 
> separate commands anyway, dispatch many or all and then wait for them as 
> a distinct step.  If the commands start thrashing the rest of the OS 
> resources (such as the disc) then you may want to do some capacity 
> limitation, such as a counter or semaphore to limit how many go at once.
> 
> Now, waiting for a subcommand can be done in a few ways.
> 
> If you're then parent of all the processes you can keep a set() of the 
> issued process ids and then call os.wait() repeatedly, which returns the 
> pid of a completed child process. Check it against your set. If you need 
> to act on the specific process, use a dict to map pids to some record of 
> the subprocess.
> 
> Alternatively, you can spawn a Python Thread for each subcommand, have 
> the Thread dispatch the subcommand _and_ wait for it (i.e. keep your 
> command.getoutput() method, but in a Thread). Main programme waits for 
> the Threads by join()ing them.
> 

I'll just note, because no one else has brought it up yet, that rather 
than manually creating threads and/or process pools for all these 
things, this is exactly what the standard concurrent.futures module is 
for.  It's a fairly brilliant wrapper around all this stuff, and I feel 
like it often doesn't get enough love.


-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.



More information about the Python-list mailing list