[SciPy-User] Python significance / error interval / confidence interval module?

Mon Jun 20 19:16:02 EDT 2011

On Jun 17, 2011, at 8:12 PM, josef.pktd at gmail.com wrote:
> On Fri, Jun 17, 2011 at 1:08 PM, Bruce Southey <bsouthey at gmail.com> wrote:
>> On 06/17/2011 11:21 AM, josef.pktd at gmail.com wrote:
>>> On Fri, Jun 17, 2011 at 11:12 AM, Gael Varoquaux
>>> <gael.varoquaux at normalesup.org>  wrote:
>>>> On Fri, Jun 17, 2011 at 05:08:16PM +0200, Christoph Deil wrote:
>>>>>     I am looking for a python module for significance / error interval /
>>>>>     confidence interval computation.
>>>> How about http://pypi.python.org/pypi/uncertainties/
>>>> 
>>>>>     Specifically I am looking for Poisson rate estimates in the presence of
>>>>>     uncertain background and / or efficiency, e.g. for an "on/off
>>>>>     measurement".
>>>> Wow, that seems a bit more involved than Gaussian error statistics. I am
>>>> not sure that the above package will solve your problem.
>>>> 
>>>>>     The standard method of Rolke I am mainly interested in is available in
>>>>>     ROOT and RooStats, a C++ high energy physics data analysis package:
>>>> If you really need proper Poisson-rate errors, then you might indeed not
>>>> to translate the Rolke method to Python. How about contributing it to
>>>> uncertainties.

Gael, the uncertainties package ( http://packages.python.org/uncertainties/ ) is only for error propagation, 
not error computation, so I don't think methods for Poisson-rate error computation would fit there.

By the way: everyone doing data analysis needs to propagate errors sometimes.
In my opinion uncertainties is so useful that its functionality should be included in scipy.

>>> It's a very specific model, and I doubt it's covered by any general
>>> packages, but implementing
>>> http://lanl.arxiv.org/abs/physics/0403059
>>> assuming this is the background for it, doesn't sound too difficult.
>>> 
>>> The main work it looks like is keeping track of all the different
>>> models and parameterizations.
>>> scipy.stats.distributions and scipy.optimize (fmin, fsolve) will cover
>>> much of the calculations.
>>> 
>>> (But then of course there is testing and taking care of corner cases
>>> which takes at least several times as long as the initial
>>> implementation, in my experience.)
>>> 
>>> Josef
>>>> 
>> Actually I am more interested in how this differs from a generalized
>> linear model where modeling Poisson or negative binomial distribution is
>> feasible.
>> Bruce
> 
> That was my first guess, but in the paper it's pretty different, in
> the paper the assumption is that two variables are observed, x,y,
> which each have different independent distribution, but have some
> parameters in common
> 
> X ∼ Pois(μ + b), Y ∼ Pois(
b)
> 
> or variations on this like
> X ∼ Pois(eμ + b), Y ∼ N(b, sigma_b),  Z ∼ N(e, sigma_e)
> 
> The rest is mostly profile likelihood from a quick skimming of the
> paper, to get confidence intervals on mu, getting rid of the nuisance
> parameter
> 
> Josef

Josef, thanks a lot for your helpful comments!