[SciPy-Dev] Proposal to add Inverse of Log CDF of Normal Distribution to scipy.special

Steppi, Albert Albert_Steppi at hms.harvard.edu
Sun Apr 25 00:16:45 EDT 2021


Hi,

I'm a software developer employed in an academic laboratory working primarily on research projects involving machine learning and biomedical text mining. I've been a longtime scipy user and am a big fan of your work. One of the leads of my team is working on a problem where it has become important to calculate z-scores associated to log p-values which can at times be very small. The naive solution of applying

scipy.special.ndtri(numpy.exp(log_p))

fails when log_p is less than approximately -745 due to underflow. I found a solution to this problem by inspecting the underlying C code to scipy.special.ndtri.and my team lead suggested I post an issue to see if there was any interest in adding an inverse of the log CDF of the normal distribution to scipy.  That issue can be found here https://github.com/scipy/scipy/issues/13923. (You can find the details of how the proposed implementation works there.)

It was pointed out that another user had previously posted an issue,
https://github.com/scipy/scipy/issues/11465, asking for the same function (among other things). 

If there's interest in adding an inverse to the log CDF of the normal distribution to scipy.special I can submit a PR in the next week or two. It's not clear to me what such a function should be called though. The related functions that currently exist in scipy special are called ndtr (CDF of normal distribution), log_ndtr (log of CDF of normal distribution), ndtri (inverse of CDF of normal distribution). log_ndtri is ambiguous, ndtri_exp is unambiguous and possibly acceptable. I've found that in the Julia Stats library this function is called norminvlogcdf, and the analogous functions for all distributions seem to follow the same naming scheme, https://github.com/JuliaStats/StatsFuns.jl/blob/master/src/distrs/norm.jl
(I've checked just now and it appears that the Julia function applies the same technique I propose. I wasn't previously aware of that.) Perhaps whatever name is chosen should be thought of as defining a standard for the names of any inverse log CDF functions that may be added in the future.

I encourage anyone interested in extended discussion to come to the comments section on the related issue https://github.com/scipy/scipy/issues/13923

Thanks,
Albert


Albert Steppi III, Ph.D.
Scientific Software Developer
Laboratory of Systems Pharmacology
Harvard Medical School



More information about the SciPy-Dev mailing list