More CPUs doen't equal more speed

Avi Gross avigross at verizon.net
Thu May 23 17:41:36 EDT 2019


Bob,

As others have noted, you have not made it clear how what you are doing is
running "in parallel."

I have a similar need where I have thousands of folders and need to do an
analysis based on the contents of one at a time and have 8 cores available
but the process may run for months if run linearly. The results are placed
within the same folder so each part can run independently as long as shared
resources like memory are not abused.

Your need is conceptually simple. Break up the list of filenames into N
batches of about equal length. A simple approach might be to open N terminal
or command windows and in each one start a python interpreter by hand
running the same program which gets one of the file lists and works on it.
Some may finish way ahead of others, of course. If anything they do writes
to shared resources such as log files, you may want to be careful. And there
is no guarantee that several will not run on the same CPU. There is also
plenty of overhead associated with running full processes. I am not
suggesting this but it is fairly easy to do and may get you enough speedup.
But since you only seem to need a few minutes, this won't be much.

Quite a few other solutions involve using some form of threads running
within a process perhaps using a queue manager. Python has multiple ways to
do this. You would simply feed all the info needed (file names in your case)
to a thread that manages a queue. It would allow up to N threads to be
started and whenever one finishes, would be woken to start a replacement
till done. Unless one such thread takes very long, they should all finish
reasonably close to each other. Again, lots of details to make sure the
threads do not conflict with each other. But, no guarantee which core they
get unless you use an underlying package that manages that. 

So you might want to research available packages that do much of the work
for you and provide some guarantees.

An interesting question is how to set the chosen value of N. Just because
you have N cores, you do not necessarily choose N. There are other things
happening on the same machine with sometimes thousands of processes or
threads in the queue even when the machine is sitting there effectively
doing nothing. If you will also keep multiple things open (mailer, WORD,
browsers, ...) you need some bandwidth so everything else gets enough
attention. So is N-1 or N-2 better? Then again, if your task has a mix of
CPU and I/O activities then it may make sense to run more than N in parallel
even if several of them end up on the same CORE as they may interleave with
each other and one make use of the CPU while the others are waiting on I/O
or anything slower.

I am curious to hear what you end up with. I will be reading to see if
others can point to modules that already support something like this with
you supplying just a function to use for each thread.

I suggest you consider your architecture carefully. Sometimes it is better
to run program A (in Python or anything else) that sets up what is needed
including saving various data structures on disk needed for each individual
run. Then you start the program that reads from the above and does the
parallel computations and again writes out what is needed such as log
entries, or data in a CSV. Finally, when it is all done, another program can
gather in the various outputs and produce a consolidated set of info. That
may be extra work but minimizes the chance of the processes interfering with
each other. It also may allow you to run or re-run smaller batches or even
to farm out the work to other machines. If you create a few thousand
directories (or just files)  with names like do0001 then you can copy them
to another machine where you ask it to work on do0* and yet another on do1*
and so on, using the same script. This makes more sense for my project which
literally may take months or years if run exhaustively on something like a
grid search trying huge numbers of combinations.

Good luck.

Avi

-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Bob van der Poel
Sent: Thursday, May 23, 2019 2:40 PM
To: Python <python-list at python.org>
Subject: More CPUs doen't equal more speed

I've got a short script that loops though a number of files and processes
them one at a time. I had a bit of time today and figured I'd rewrite the
script to process the files 4 at a time by using 4 different instances of
python. My basic loop is:

for i in range(0, len(filelist), CPU_COUNT):
    for z in range(i, i+CPU_COUNT):
        doit( filelist[z])

With the function doit() calling up the program to do the lifting. Setting
CPU_COUNT to 1 or 5 (I have 6 cores) makes no difference in total speed.
I'm processing about 1200 files and my total duration is around 2 minutes.
No matter how many cores I use the total is within a 5 second range.

This is not a big deal ... but I really thought that throwing more
processors at a problem was a wonderful thing :) I figure that the cost of
loading the python libraries and my source file and writing it out are
pretty much i/o bound, but that is just a guess.

Maybe I need to set my sights on bigger, slower programs to see a difference
:)

-- 

**** Listen to my FREE CD at http://www.mellowood.ca/music/cedars **** Bob
van der Poel ** Wynndel, British Columbia, CANADA **
EMAIL: bob at mellowood.ca
WWW:   http://www.mellowood.ca
--
https://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list