[SciPy-Dev] Subversion scipy.stats irregular problem with source code example

Mon Oct 11 15:45:20 EDT 2010

The genetic algorithm approach is not working as a general solution to
the problem of finding starting parameters for fmin() for statistical
distributions, presumably due to extreme parameter sensitivity.  I do
not see a general solution to the problem given these results.  See
the attached Python file, also copied below.

My results:

Digits of precision test for the beta distribution
nnlf native    = inf
nnlf 16 digits = 10.14091764
nnlf 15 digits = 10.3222074111
nnlf 14 digits = 10.977829575
nnlf 13 digits = inf
nnlf 12 digits = 13.198954184

     James

import scipy, scipy.stats

data = scipy.array([
3.017,2.822,2.632,2.287,2.207,2.048,
1.963,1.784,1.712,2.972,2.719,2.495,
2.070,1.969,1.768,1.677,1.479,1.387,
2.843,2.485,2.163,1.687,1.408,1.279,
1.016,0.742,0.607])

# parameters
p1 = 7.69589403034175001E-01
p2 = 5.52884409849620395E-01
p3 = 6.06094740472452820E-01
p4 = 2.41090525952754753E+00

print "Digits of precision test for the beta distribution"
print "nnlf native    =", scipy.stats.beta.nnlf([p1, p2, p3, p3], data)
print "nnlf 16 digits =", scipy.stats.beta.nnlf([float("%.16E" % p1),
float("%.16E" % p2), float("%.16E" % p3), float("%.16E" % p4)], data)
print "nnlf 15 digits =", scipy.stats.beta.nnlf([float("%.15E" % p1),
float("%.15E" % p2), float("%.15E" % p3), float("%.15E" % p4)], data)
print "nnlf 14 digits =", scipy.stats.beta.nnlf([float("%.14E" % p1),
float("%.14E" % p2), float("%.14E" % p3), float("%.14E" % p4)], data)
print "nnlf 13 digits =", scipy.stats.beta.nnlf([float("%.13E" % p1),
float("%.13E" % p2), float("%.13E" % p3), float("%.13E" % p4)], data)
print "nnlf 12 digits =", scipy.stats.beta.nnlf([float("%.12E" % p1),
float("%.12E" % p2), float("%.12E" % p3), float("%.12E" % p4)], data)

On Fri, Oct 1, 2010 at 7:58 AM, James Phillips <zunzun at zunzun.com> wrote:
> On Fri, Oct 1, 2010 at 7:21 AM, James Phillips <zunzun at zunzun.com> wrote:
>>
>> Here are current timing results fitting both loc and scale...
>
> I'm iterating over all continuous distributions now, and the genetic
> algorithm results are showing which distributions can be run with loc
> = min(data) and scale = max(data) - min(data).  With that information
> in hand I can then speed up the overall fitting considerably by not
> fitting those parameters.
>
>     James
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sensitivity.py
Type: text/x-python
Size: 1178 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20101011/75d3a9d1/attachment.py>