[Spambayes] Proposing to remove 4 combining schemes

Rob Hooft rob@hooft.net
Thu Oct 17 05:42:52 2002


Tim Peters wrote:
> I propose to remove these options and their supporting code:
> 
>     use_central_limit
>     use_central_limit2
>     use_central_limit3

Go ahead.

>     use_z_combining

I guess that means that no RMS magic can help here. Go ahead.

> Note that these three are 100% compatible at the database level:  they don't
> affect *training* at all.  The only difference among them is the
> implementation of Bayes.spamprob() (the scoring function).  A trained
> classifier can use any of these three freely.  Indeed, it's possible (no
> experiments have been done on this) that a "hard" msg for one scheme could
> benefit via getting scored again by one or both of the others.

I don't expect a lot from that. You and I at least have repeatedly seen 
the same fp and fn's across methods.

> Now that I'm playing with a UI (Sean & Mark's code) as a user, I'm growing
> fonder of the non-chi schemes again.  Rational or not, I find that the more
> uniform range of outcomes in [0.0, 1.0] is psychologically reassuring when
> using a UI that throws the scores in your face.

But it is unrealistic. Think about the original problem again: "why 
can't software that classifies ham/spam be very easy? Almost all spam's 
scream in your face that they are". With chi_squared combining we found 
a method that agrees with this. Most messages scream either "Ham" or 
"Spam", and there is very little left to doubt.

You can downscale things a bit by reducing the final S,H-score in 
chi_squared combining before calling chi2Q. Maybe take the sqrt or 
something similar. That is actually realistic because of  correlations. 
It may shift a few messages along the middle ground, but not have a lot 
of effect on separating ham and spam except broadening the distribution 
a bit.

Maybe the better answer is that the final UI shouldn't throw the scores 
in your face.

> If there are no killer objections, I'll remove the 4 schemes in question.

Did you ever try tim combining with (S-H+1)/2?

Rob
-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/