[Numpy-discussion] NEP 38 - Universal SIMD intrinsics

Tue Feb 4 14:59:54 EST 2020

—snip—

> 1) Once NumPy adds the framework and initial set of Universal Intrinsic, if contributors want to leverage a new architecture specific SIMD instruction, will they be expected to add software implementation of this instruction for all other architectures too?

In my opinion, if the instructions are lower, then yes. For example, one cannot add AVX-512 without adding, for example adding AVX-256 and AVX-128 and SSE*.  However, I would not expect one person or team to be an expert in all assemblies, so intrinsics for one architecture can be developed independently of another.

> 2) On whom does the burden lie to ensure that new implementations are benchmarked and shows benefits on every architecture? What happens if optimizing an Ufunc leads to improving performance on one architecture and worsens performance on another?

I would look at this from a maintainability point of view. If we are increasing the code size by 20% for a certain ufunc, there must be a domonstrable 20% increase in performance on any CPU. That is to say, micro-optimisation will be unwelcome, and code readability will be preferable. Usually we ask the submitter of the PR to test the PR with a machine they have on hand, and I would be inclined to keep this trend of self-reporting. Of course, if someone else came along and reported a performance regression of, say, 10%, then we have increased code by 20%, with only a net 5% gain in performance, and the PR will have to be reverted.

—snip—
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200204/304bef43/attachment.html>