[Numpy-discussion] Floating point precision expectations in NumPy

Fri Jul 30 14:04:06 EDT 2021

On Tue, Jul 27, 2021 at 4:55 PM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> Hi all,
>
> there is a proposal to add some Intel specific fast math routine to
> NumPy:
>
>     https://github.com/numpy/numpy/pull/19478
>
> part of numerical algorithms is that there is always a speed vs.
> precision trade-off, giving a more precise result is slower.
>
> So there is a question what the general precision expectation should be
> in NumPy.  And how much is it acceptable to diverge in the
> precision/speed trade-off depending on CPU/system?
>
> I doubt we can formulate very clear rules here, but any input on what
> precision you would expect or trade-offs seem acceptable would be
> appreciated!
>
>
> Some more details
> -----------------
>
> This is mainly interesting e.g. for functions like logarithms,
> trigonometric functions, or cubic roots.
>
> Some basic functions (multiplication, addition) are correct as per IEEE
> standard and give the best possible result, but these are typically
> only correct within very small numerical errors.
>
> This is typically measured as "ULP":
>
>      https://en.wikipedia.org/wiki/Unit_in_the_last_place
>
> where 0.5 ULP would be the best possible result.
>
>
> Merging the PR may mean relaxing the current precision slightly in some
> places.  In general Intel advertises 4 ULP of precision (although the
> actual precision for most functions seems better).
>
>
> Here are two tables, one from glibc and one for the Intel functions:
>
>
> https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> (Mainly the LA column)
> https://software.intel.com/content/www/us/en/develop/documentation/onemkl-vmperfdata/top/real-functions/measured-accuracy-of-all-real-vm-functions.html
>
>
> Different implementation give different accuracy, but formulating some
> guidelines/expectation (or referencing them) would be useful guidance.
>

"Close enough" depends on the application but non-linear models can get the
"butterfly effect" where the results diverge if they aren't identical.

For a certain class of scientific programming applications, reproducibility
is paramount.

Development teams may use a variety of development laptops, workstations,
scientific computing clusters, and cloud computing platforms. If the tests
pass on your machine but fail in CI, you have a debugging problem.

If your published scientific article links to source code that replicates
your computation, scientists will expect to be able to run that code, now
or in a couple decades, and replicate the same outputs. They'll be using
different OS releases and maybe different CPU + accelerator architectures.

Reproducible Science is good. Replicated Science is better.
<http://rescience.github.io/>

Clearly there are other applications where it's easy to trade
reproducibility and some precision for speed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210730/5483617d/attachment.html>