[Spambayes] Proposing to remove 4 combining schemes

Guido van Rossum guido@python.org
Thu Oct 17 12:53:38 2002


[Tim]
> > Now that I'm playing with a UI (Sean & Mark's code) as a user, I'm
> > growing fonder of the non-chi schemes again.  Rational or not, I
> > find that the more uniform range of outcomes in [0.0, 1.0] is
> > psychologically reassuring when using a UI that throws the scores
> > in your face.

[Rob]
> But it is unrealistic. Think about the original problem again: "why
> can't software that classifies ham/spam be very easy? Almost all
> spam's scream in your face that they are". With chi_squared
> combining we found a method that agrees with this. Most messages
> scream either "Ham" or "Spam", and there is very little left to
> doubt.

But in real life there are also plenty of messages that mislead or
defy the human screener (if only for a second), and if these still
have a significant chance of becoming a f.p. or f.n., it would be
appropriate if the score reflected that uncertainty.  It may be clear
by now that I haven't been following recent discussions much -- but
the "all outcomes are extreme" characteristic was what led us to look
for an alternative to Graham's scheme, and I've come to appreciate
having a gray area.

> Maybe the better answer is that the final UI shouldn't throw the
> scores in your face.

While you're still deciding on how much value you place on
f.p. vs. f.n., the score can be very helpful (as long as it has a
middle ground).

--Guido van Rossum (home page: http://www.python.org/~guido/)