Multiprocessing and memory management

Israel Brewster ijbrewster at alaska.edu
Wed Jul 3 12:37:50 EDT 2019


I have a script that benefits greatly from multiprocessing (it’s generating a bunch of images from data). Of course, as expected each process uses a chunk of memory, and the more processes there are, the more memory used. The amount used per process can vary from around 3 GB (yes, gigabytes) to over 40 or 50 GB, depending on the amount of data being processed (usually closer to 10GB, the 40/50 is fairly rare). This puts me in a position of needing to balance the number of processes with memory usage, such that I maximize resource utilization (running one process at a time would simply take WAY to long) while not overloading RAM (which at best would slow things down due to swap). 

Obviously this process will be run on a machine with lots of RAM, but as I don’t know how large the datasets that will be fed to it are, I wanted to see if I could build some intelligence into the program such that it doesn’t overload the memory. A couple of approaches I thought of:

1) Determine the total amount of RAM in the machine (how?), assume an average of 10GB per process, and only launch as many processes as calculated to fit. Easy, but would run the risk of under-utilizing the processing capabilities and taking longer to run if most of the processes were using significantly less than 10GB

2) Somehow monitor the memory usage of the various processes, and if one process needs a lot, pause the others until that one is complete. Of course, I’m not sure if this is even possible.

3) Other approaches?


---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145




More information about the Python-list mailing list