[issue37295] Possible optimizations for math.comb()

Tim Peters report at bugs.python.org
Thu Dec 30 11:51:29 EST 2021


Tim Peters <tim at python.org> added the comment:

[Mark]
> I ran some timings for comb(k, 67) on my macOS / Intel MacBook Pro,
> using timeit to time calls to a function that looked like this:
>
> def f(comb):
>     for k in range(68):
>         for _ in range(256):
>             comb(k, 67)
>             comb(k, 67)
>            ... # 64 repetitions of comb(k, 67) in all

I'm assuming you meant to write comb(67, k) instead, since the comb(k, 67) given is 0 at all tested k values except for k=67, and almost never executes any of the code in question.

It's surprising to me that even the long-winded popcount code was faster! The other way needs to read up 3 1-byte values from a trailing zero table, but the long-winded popcount emulation needs to read up 4 4-byte mask constants (or are they embedded in the instruction stream?), in addition to doing many more bit-fiddling operations (4 shifts, 4 "&" masks, 3 add/subtract, and a multiply - compared to just 2 add/subtract).

So if the results are right, Intel timings make no sense to me at all ;-)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue37295>
_______________________________________


More information about the Python-bugs-list mailing list