[Spambayes] Proposing to remove 4 combining schemes
Rob Hooft
rob@hooft.net
Thu Oct 17 05:42:52 2002
Tim Peters wrote:
> I propose to remove these options and their supporting code:
>
> use_central_limit
> use_central_limit2
> use_central_limit3
Go ahead.
> use_z_combining
I guess that means that no RMS magic can help here. Go ahead.
> Note that these three are 100% compatible at the database level: they don't
> affect *training* at all. The only difference among them is the
> implementation of Bayes.spamprob() (the scoring function). A trained
> classifier can use any of these three freely. Indeed, it's possible (no
> experiments have been done on this) that a "hard" msg for one scheme could
> benefit via getting scored again by one or both of the others.
I don't expect a lot from that. You and I at least have repeatedly seen
the same fp and fn's across methods.
> Now that I'm playing with a UI (Sean & Mark's code) as a user, I'm growing
> fonder of the non-chi schemes again. Rational or not, I find that the more
> uniform range of outcomes in [0.0, 1.0] is psychologically reassuring when
> using a UI that throws the scores in your face.
But it is unrealistic. Think about the original problem again: "why
can't software that classifies ham/spam be very easy? Almost all spam's
scream in your face that they are". With chi_squared combining we found
a method that agrees with this. Most messages scream either "Ham" or
"Spam", and there is very little left to doubt.
You can downscale things a bit by reducing the final S,H-score in
chi_squared combining before calling chi2Q. Maybe take the sqrt or
something similar. That is actually realistic because of correlations.
It may shift a few messages along the middle ground, but not have a lot
of effect on separating ham and spam except broadening the distribution
a bit.
Maybe the better answer is that the final UI shouldn't throw the scores
in your face.
> If there are no killer objections, I'll remove the 4 schemes in question.
Did you ever try tim combining with (S-H+1)/2?
Rob
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/