[SciPy-User] Unit testing of Bayesian estimator

Bruce Southey bsouthey at gmail.com
Sat Nov 7 22:23:25 EST 2009


On Fri, Nov 6, 2009 at 6:13 PM, Anne Archibald
<aarchiba at physics.mcgill.ca> wrote:
> Hi,
>
> I have implemented a simple Bayesian regression program (it takes
> events modulo one and returns a posterior probability that the data is
> phase-invariant plus a posterior distribution for two parameters
> (modulation fraction and phase) in case there is modulation).

I do not know your field, a little rusty on certain issues and I do
not consider myself a Bayesian.

Exactly what type of Bayesian did you use?
I also do not know how you implemented it especially if it is
empirical or Monte Carlo Markov Chains.

> I'm
> rather new at this, so I'd like to construct some unit tests. Does
> anyone have any suggestions on how to go about this?

Since this is a test, the theoretical 'correctness' is irrelevant. So
I would guess that you should use very informative priors and data
with a huge amount of information. That should make the posterior have
an extremely narrow range so your modal estimate is very close to the
true value within a very small range.

After that it really depends on the algorithm, the data used and what
you need to test. Basically you just have to say given this set of
inputs I get this 'result' that I consider reasonable. After all, if
the implementation of algorithm works then it is most likely the
inputs that are a problem. In statistics, problems usually enter
because the desired model can not be estimated from the provided data.
Separation of user errors from a bug in the code usually identified by
fitting simpler or alternative models.

>
> For a frequentist periodicity detector, the return value is a
> probability that, given the null hypothesis is true, the statistic
> would be this extreme. So I can construct a strong unit test by
> generating a collection of data sets given the null hypothesis,
> evaluating the statistic, and seeing whether the number that claim to
> be significant at a 5% level is really 5%. (In fact I can use the
> binomial distribution to get limits on the number of false positive.)
> This gives me a unit test that is completely orthogonal to my
> implementation, and that passes if and only if the code works. For a
> Bayesian hypothesis testing setup, I don't really see how to do
> something analogous.
>
> I can generate non-modulated data sets and confirm that my code
> returns a high probability that the data is not modulated, but how
> high should I expect the probability to be? I can generate data sets
> with models with known parameters and check that the best-fit
> parameters are close to the known parameters - but how close? Even if
> I do it many times, is the posterior mean unbiased? What about the
> posterior mode or median? I can even generate models and then data
> sets that are drawn from the prior distribution, but what should I
> expect from the code output on such a data set? I feel sure there's
> some test that verifies a statistical property of Bayesian
> estimators/hypothesis testers, but I cant quite put my finger on it.
>
> Suggestions welcome.
>
> Thanks,
> Anne

Please do not mix Frequentist or Likelihood concepts with Bayesian.
Also you never generate data for estimation from the prior
distribution, you generate it from the posterior distribution as that
is what your estimating.

Really in Bayesian sense all this data generation is unnecessary
because you have already calculated that information in computing the
posteriors. The posterior of a parameter is a distribution not a
single number so you just compare distributions.  For example, you can
compute modal values and construct Bayesian credible intervals of the
parameters. These should make very strong sense to the original values
simulated.

For Bayesian work, you must address the data and the priors. In
particular, you need to be careful about the informativeness of the
prior. You can get great results just because your prior was
sufficiently informative but you can get great results because you
data was very informative.

Depending on how it was implemented, a improper prior can be an issue
because these do not guarantee a proper posterior (but often do lead
to proper posteriors). So if your posterior is improper then you are
in a very bad situation and can lead to weird results some or all of
the time.Some times this is can easily be fixed such as by putting
bounds on flat priors. Whereas proper priors give proper posteriors.

But as a final comment, it should not matter which approach you use as
if you do not get what you simulated then either your code is wrong or
you did not simulate what your code implements. (Surprising how
frequent the latter is.)

Bruce



More information about the SciPy-User mailing list