[SciPy-Dev] speed of nosetests scipy.stats

Wed Jan 12 20:51:29 EST 2011

On Wed, Jan 12, 2011 at 7:56 PM, Skipper Seabold <jsseabold at gmail.com> wrote:
> On Wed, Jan 12, 2011 at 7:32 PM, Robert Kern <robert.kern at gmail.com> wrote:
>> On Wed, Jan 12, 2011 at 18:21,  <josef.pktd at gmail.com> wrote:
>>> On Wed, Jan 12, 2011 at 6:58 PM, Robert Kern <robert.kern at gmail.com> wrote:
>>>> On Wed, Jan 12, 2011 at 17:47,  <josef.pktd at gmail.com> wrote:
>>>>> On Wed, Jan 12, 2011 at 5:52 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>>>> On Wed, Jan 12, 2011 at 2:42 PM,  <josef.pktd at gmail.com> wrote:
>>>>>>
>>>>>>> and what kind of operation produces
>>>>>>> "Warning: invalid value encountered in subtract" ?
>>>>>>>
>>>>>>> I get hundreds of these
>>>>>>
>>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2010-November/054128.html
>>>>>
>>>>> Yes, I will switch soon again to general ignore. But I might have a
>>>>> look at some of the reasons for these warnings in scipy.stats.
>>>>>
>>>>> I know zero division error, but I have no idea what's an invalid value
>>>>> for subtract.
>>>>> I thought we can subtract anything from anything that might be a
>>>>> number or "not a number" or infinite.
>>>>
>>>> It just means that a NaN popped up somewhere in the calculation when
>>>> there was no NaN in the inputs, thus setting a floating point
>>>> exception flag in your FPU, nothing more. Note that the message says
>>>> "in subtract", not "for subtract". It's not a value judgment about
>>>> your inputs.
>>>>
>>>> [~]
>>>> |1> np.seterr(all='print')
>>>> {'divide': 'ignore', 'invalid': 'ignore', 'over': 'ignore', 'under': 'ignore'}
>>>>
>>>> [~]
>>>> |2> np.subtract(np.inf, np.inf)
>>>> Warning: invalid value encountered in subtract
>>>> nan
>>>
>>> Thanks,
>>>
>>> I only tried
>>>
>>>>>> np.seterr()
>>> {'over': 'print', 'divide': 'print', 'invalid': 'print', 'under': 'ignore'}
>>>
>>>>>> np.inf - np.inf
>>> -1.#IND
>>>
>>> which is obviously something different from np.subtract
>>
>> Yes. The operators on scalar objects tend to avoid the ufunc
>> machinery, and it is the ufunc machinery that issues those errors.
>>
>>> A lot of nans in the calculations is not good news. I just wonder if
>>> some of it is because of the switch to logs in some calculations
>>> (logpdf)
>>>
>>>>>> 0**0
>>> 1
>>>>>> 0*np.log(0)
>>> Warning: invalid value encountered in double_scalars
>>> nan
>>
>> Could be. Try np.seterr(invalid='raise'), then you will get
>> tracebacks. Or you can use np.seterr(invalid='call') and
>> np.seterrcall() to set up a function that gets called that will print
>> out the current stack trace (or just record it somewhere so you can
>> aggregate things for easier analysis), but then continue on.
>>
>
> I usually do something like
>
> nosetests scipy.stats verbosity=3 &> test_output.txt
>
> to see where the warning are coming from.  Might need to use 2> on
> Windows(?).  I've also seen these for instance during optimizations
> with exp or log or some ufunc in the objective function, but the
> optimum is still reached so they're somewhat spurious (other times
> they weren't so harmless).

harmless or not, that's what I would like to know, now that I see the warnings.
It looks like the main problems are in fit and optimization. Since all the tests
pass, it doesn't look like there are any problems (in the nice test cases),
but fit() still needs work.

Robert, thanks for the tip with raise
using scipy.test(label="slow") with 427 tests, I get

======================================================================
ERROR: test_continuous_extra.test_cont_extra(<scipy.stats.distributions.lomax_ge
n object at 0x018EDC10>, (1.8771398388773268,), 'lomax loc, scale test')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_continuous
_extra.py", line 78, in check_loc_scale
    m,v = distfn.stats(*arg)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 1460, in stats
    mu, mu2, g1, g2 = self._stats(*args)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 3978, in _stats
    mu, mu2, g1, g2 = pareto.stats(c, loc=-1.0, moments='mvsk')
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 1458, in stats
    mu, mu2, g1, g2 = self._stats(*args,**{'moments':moments})
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 3939, in _stats
    vals = 2*(bt+1.0)*sqrt(b-2.0)/((b-3.0)*sqrt(b))
FloatingPointError: invalid value encountered in sqrt

======================================================================
ERROR: test_fit (test_distributions.TestFitMethod)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_distributi
ons.py", line 404, in test_fit
    vals2 = distfunc.fit(res, optimizer='powell')
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 1711, in fit
    vals = optimizer(func,x0,args=(ravel(data),),disp=0)
  File "C:\Programs\Python25\Lib\site-packages\scipy\optimize\optimize.py", line
 1519, in fmin_powell
    fval, x, direc1 = _linesearch_powell(func, x, direc1, tol=xtol*100)
  File "C:\Programs\Python25\Lib\site-packages\scipy\optimize\optimize.py", line
 1418, in _linesearch_powell
    alpha_min, fret, iter, num = brent(myfunc, full_output=1, tol=tol)
  File "C:\Programs\Python25\Lib\site-packages\scipy\optimize\optimize.py", line
 1241, in brent
    brent.optimize()
  File "C:\Programs\Python25\Lib\site-packages\scipy\optimize\optimize.py", line
 1142, in optimize
    tmp2 = (x-v)*(fx-fw)
FloatingPointError: invalid value encountered in double_scalars

======================================================================
ERROR: test_fix_fit (test_distributions.TestFitMethod)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_distributi
ons.py", line 424, in test_fix_fit
    vals = distfunc.fit(res,floc=0)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 1711, in fit
    vals = optimizer(func,x0,args=(ravel(data),),disp=0)
  File "C:\Programs\Python25\Lib\site-packages\scipy\optimize\optimize.py", line
 280, in fmin
    and max(abs(fsim[0]-fsim[1:])) <= ftol):
FloatingPointError: invalid value encountered in subtract

----------------------------------------------------------------------
Ran 427 tests in 910.359s

the non-slow test seem to have nans only when they are supposed to,
except for the pareto.stats

======================================================================
ERROR: Failure: FloatingPointError (invalid value encountered in sqrt)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\loader
.py", line 224, in generate
    for test in g():
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_continuous
_basic.py", line 171, in test_cont_basic
    m,v = distfn.stats(*arg)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 1460, in stats
    mu, mu2, g1, g2 = self._stats(*args)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 3978, in _stats
    mu, mu2, g1, g2 = pareto.stats(c, loc=-1.0, moments='mvsk')
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 1458, in stats
    mu, mu2, g1, g2 = self._stats(*args,**{'moments':moments})
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\distributions.py", li
ne 3939, in _stats
    vals = 2*(bt+1.0)*sqrt(b-2.0)/((b-3.0)*sqrt(b))
FloatingPointError: invalid value encountered in sqrt

======================================================================
ERROR: Check nanmean when all values are nan.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_stats.py",
 line 161, in test_nanmean_all
    m = stats.nanmean(self.Xall)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", line 274,
in nanmean
    return np.mean(x,axis)/factor
FloatingPointError: invalid value encountered in double_scalars

======================================================================
ERROR: Check nanstd when all values are nan.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_stats.py",
 line 176, in test_nanstd_all
    s = stats.nanstd(self.Xall)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", line 323,
in nanstd
    m1 = np.sum(x,axis)/n
FloatingPointError: invalid value encountered in double_scalars

======================================================================
ERROR: test_stats.test_ttest_rel
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_stats.py",
 line 1355, in test_ttest_rel
    assert_almost_equal(stats.ttest_rel([0,0,0], [0,0,0]), (1.0, 0.4226497308103
7421))
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", line 3002,
 in ttest_rel
    t = dm / np.sqrt(v/float(n))
FloatingPointError: invalid value encountered in double_scalars

======================================================================
ERROR: test_stats.test_ttest_ind
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_stats.py",
 line 1396, in test_ttest_ind
    assert_almost_equal(stats.ttest_ind([0,0,0], [0,0,0]), (1.0, 0.3739009663000
5898))
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", line 2921,
 in ttest_ind
    t = d/np.sqrt(svar*(1.0/n1 + 1.0/n2))
FloatingPointError: invalid value encountered in double_scalars

======================================================================
ERROR: test_stats.test_ttest_1samp_new
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.11.1-py2.5.egg\nose\case.p
y", line 183, in runTest
    self.test(*self.arg)
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\tests\test_stats.py",
 line 1435, in test_ttest_1samp_new
    assert_almost_equal(stats.ttest_1samp([0,0,0], 0), (1.0, 0.42264973081037421
))
  File "C:\Programs\Python25\Lib\site-packages\scipy\stats\stats.py", line 2831,
 in ttest_1samp
    t = d / np.sqrt(v/float(n))
FloatingPointError: invalid value encountered in double_scalars

----------------------------------------------------------------------
Ran 969 tests in 59.281s

FAILED (SKIP=1, errors=6)
<nose.result.TextTestResult run=969 errors=6 failures=0>

There are still some tests missing compared to nosetests, but I won't
be able to set np.seterr for commandline nosetests, so Skipper's
recommendation is still todo.

There are still additional overflow and division warnings, but I think
many of them will be expected.

>
> Skipper
>
> PS
>
> Ran 1572 tests in 212.470s

Thanks for checking. In the meantime I found a runaway python process,
most likely I hadn't killed the previous fit test completely, and it
was using most of my (single) CPU time. So, that's cleared up.

Josef

> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>