[Numpy-discussion] Floating point precision expectations in NumPy

Jerry Morrison jerry.morrison+numpy at gmail.com
Fri Jul 30 14:04:06 EDT 2021

On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg <sebastian at sipsolutions.net>

> Hi all,
> there is a proposal to add some Intel specific fast math routine to
> NumPy:
>     https://github.com/numpy/numpy/pull/19478
> part of numerical algorithms is that there is always a speed vs.
> precision trade-off, giving a more precise result is slower.
> So there is a question what the general precision expectation should be
> in NumPy.  And how much is it acceptable to diverge in the
> precision/speed trade-off depending on CPU/system?
> I doubt we can formulate very clear rules here, but any input on what
> precision you would expect or trade-offs seem acceptable would be
> appreciated!
> Some more details
> -----------------
> This is mainly interesting e.g. for functions like logarithms,
> trigonometric functions, or cubic roots.
> Some basic functions (multiplication, addition) are correct as per IEEE
> standard and give the best possible result, but these are typically
> only correct within very small numerical errors.
> This is typically measured as "ULP":
>      https://en.wikipedia.org/wiki/Unit_in_the_last_place
> where 0.5 ULP would be the best possible result.
> Merging the PR may mean relaxing the current precision slightly in some
> places.  In general Intel advertises 4 ULP of precision (although the
> actual precision for most functions seems better).
> Here are two tables, one from glibc and one for the Intel functions:
> https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> (Mainly the LA column)
> https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/measured-accuracy-of-all-real-vm-functions.html
> Different implementation give different accuracy, but formulating some
> guidelines/expectation (or referencing them) would be useful guidance.

"Close enough" depends on the application but non-linear models can get the
"butterfly effect" where the results diverge if they aren't identical.

For a certain class of scientific programming applications, reproducibility
is paramount.

Development teams may use a variety of development laptops, workstations,
scientific computing clusters, and cloud computing platforms. If the tests
pass on your machine but fail in CI, you have a debugging problem.

If your published scientific article links to source code that replicates
your computation, scientists will expect to be able to run that code, now
or in a couple decades, and replicate the same outputs. They'll be using
different OS releases and maybe different CPU + accelerator architectures.

Reproducible Science is good. Replicated Science is better.

Clearly there are other applications where it's easy to trade
reproducibility and some precision for speed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210730/5483617d/attachment.html>

More information about the NumPy-Discussion mailing list