[SciPy-Dev] scipy.special.smirnov, scipy/special/cephes/kolmogorov.c

Tue Jun 6 15:23:58 EDT 2017

Paul van Mulbregt kirjoitti 06.06.2017 klo 00:23:
> 1. The first part involves modifying algorithms and C code in
> scipy/special/cephes/kolmogorov.c.
> The cephes code appears to have undergone mostly maintenance changes the
> last N years. 
> Is there some documentation on policies for the cephes C code? 
> Such as:
> * What kind of changes are being accepted?
> * When to report OVERFLOW or UNDERFLOW?
> * How much commenting is appropriate for algorithm changes?
> * Is it tabs or spaces in the C code? (Both styles appear to be in use.)
> * Are there specs, either explicit or implicit, for the functions in
> cephes? > * What is required for an algorithmic change to be considered?
> * What is required for a new function to be added?
> * Is the test suite expansive enough to detect newly introduced errors?

Bugfixes are OK, and you can add new functions --- although it probably
would be clearer to not put them under the cephes/ folder, unless it's
coupled to the existing implementation.

I would refrain from large reorganizations, even though at this point
the library code is already a fork.

You do not need to add comments in the source code on *changes* you
made. There may be such comments there, but this is not good style
nowadays. As usual, it is however good to add enough comments and
especially the literature references, to make it possible for other
people (years later) to understand what the code does and why.

What changes were made and why however should be explained in the commit
messages in git, which records who did what.

To get an algorithm change accepted, you need to show that (i) the old
behavior is wrong, and (ii) the new behavior is right, in a way that can
be verified later on.

The way to do this is to add appropriate tests in the test suite. E.g.
test against known-good ground truth values, and across "difficult"
parameter regimes as appropriate covering the branches of the
algorithm(s) used.

If you have gcc and lcov installed, you can run 'python runtests.py
--gcov -s special -m full' followed by 'python runtests.py --lcov-html'
to generate coverage report that you can look at what parts of the code
got run.

One possible source of "true" values are other libraries. In particular,
if the function exists in mpmath, you can generate values on-the-line
using it. For data from other sources, you can save values in a text
file under tests/data/local (see the README there on how the data files
are used).

I don't know the status of smirnov, but unless the function has been
touched since 2007 I would not trust the existing tests to test the
function for *correctness*. The old test codes for the stats functions
AFAIK largely just assume the implementations are correct and only
smoketest the bindings.

If it's stats function, it may be tested indirectly via the stats
routines, but this is probably not enough.

So if you want to change the implementation, getting enough test data to
which you can compare against is probably necessary.

> 2. The second part of the change would involve exporting a newly created
> C function smirnovp(), the derivative of smirnov(), to be used in the
> ksone python code.
> Is there documentation on what would be required to export a new cephes
> function?

The process is relatively straightforward once you know what to do:

1) Add the function to the list in generate_ufuncs.py. See the
discussion in that file and look at how the other functions are done as
an example.

2) Add an entry to add_newdocs.py to include the documentation. The
above two are the *only* files you modify manually (apart from the *.h
and *.c files for the implementation itself, of course).

3) Run generate_ufuncs.py to regenerate the other files.

4) (Re)build Scipy as usual.

	Pauli