[issue32986] multiprocessing, default assumption of Pool size unhelpful

Wed Mar 7 17:45:37 EST 2018

Nathaniel Smith <njs at pobox.com> added the comment:

> You mean duplicating "nproc"'s logic in Python?

Yeah.

> If someone wants to do the grunt work of implementing/testing it...

Well, that's true of any bug fix / improvement :-). The logic isn't terribly complicated though, something roughly like:

def parse_omp_envvar(env_value):
    return int(env_value.strip().split(",")[0])

def estimate_cpus():
    limit = float("inf")
    if "OMP_THREAD_LIMIT" in os.environ:
        limit = parse_omp_envvar(os.environ["OMP_THREAD_LIMIT"])

    if "OMP_NUM_THREADS" in os.environ:
        cpus = parse_omp_envvar(os.environ["OMP_NUM_THREADS"])
    else:
        try:
            cpus = len(os.sched_getaffinity(os.getpid()))
        except AttributeError, OSError:
            cpus = os.cpu_count()

    return min(cpus, limit)

> There's also the question of how that affects non-scientific workloads. People can use thread pools or process pools for other purposes, such as distributing (blocking) I/O.

We already have some heuristics for this: IIRC the thread pool executor defaults to cpu_count() * 5 threads (b/c Python threads are really intended for I/O-bound workloads), and the process pool executor and multiprocessing.Pool defaults to cpu_count() processes (b/c processes are better suited to CPU-bound workloads). Neither of these heuristics is perfect. But inasmuch as it makes sense at all to use the cpu count as part of the heuristic, it surely will work better to use a more accurate estimate of the available cpus.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue32986>
_______________________________________