[SciPy-Dev] Subversion scipy.stats irregular problem with source code example

Fri Oct 1 14:07:20 EDT 2010

On Fri, Oct 1, 2010 at 1:21 PM, James Phillips <zunzun at zunzun.com> wrote:
> On Fri, Oct 1, 2010 at 6:55 AM,  <josef.pktd at gmail.com> wrote:
>> Which version of scipy are you using for testing?
>
> Scipy 0.7.0 on Ubuntu Lucid Lynx with Python 2.6.5.
>
>
>> I had to fix some python 2.5 incompatibilities (*args), but when I ran
>> the script it seemed to get stuck after printing 4 or 5 iterations. I
>> let it run for a few minutes but then killed the process.
>
> At the top of diffev.py's solve() method is a loop that runs per
> generation of the genetic algorithm, you might consider placing a
> "print gen" statement there to see if it is running or stuck somehow.
>
> The virtual server I'm using has 4 CPUs so I plan to run the
> distribution fitting in parallel.  Here are current timing results
> fitting both loc and scale, in parallel these would have taken 207
> seconds (gamma time):

much better than my old notebook

>
> distribution: powerlaw took 69.5095369816 seconds to run
> de parameters [ 1.31332767  0.55933617  2.45766383]
> nnlf: 23.3610045864
>
> distribution: beta took 47.2591450214 seconds to run
> de parameters [ 0.70567183  0.6011756   0.607       2.41      ]
> nnlf: 7.45846316363
>
> distribution: gamma took 206.991358995 seconds to run
> de parameters [  1.40727512e+02  -5.83784812e+00   5.55091421e-02]
> nnlf: 26.7597491247
>
> distribution: pareto took 86.9141609669 seconds to run
> de parameters [  7.75987131e+13  -1.05369726e+14   1.05369726e+14]
> nnlf: 35.2599606822

I did manage to run gamma, the parameters change a lot from generation
to generation, it could also be the case that with your small sample
some parameters are not well identified.

However,  with my scipy version, I'm getting some nans for which I
don't see any reason at all.

>>> scipy.stats.gamma.pdf(np.linspace(0,5,11), 300.5609591140931425, loc=0, scale=0.5)
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> scipy.stats.gamma.pdf(np.linspace(0,5,11), 300.5609591140931425, loc=0, scale=0.25)
array([  0.,   0.,   0.,   0.,   0.,   0.,  NaN,  NaN,  NaN,  NaN,  NaN])

Given that you are doing a randomized search, there might not be
enough restrictions that the parameters make sense. I did some fuzz
testing when I got started with distributions but haven't run them in
a long time. Your random parameters might hit some range where the
numerical accuracy and correct results have never been checked.
It's possible that fmin could get stuck with some nans in the results.
I don't know how many dark corners are left.

Josef

>
>
>     James
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>