[Python-Dev] The first trustworthy <wink> GBayes results

Tim Peters tim.one@comcast.net
Tue, 03 Sep 2002 16:32:55 -0400


[Neil Schemenauer]
> I noticed that as well.  When the classifier goes wrong it goes badly
> wrong and using different thresholds would not help.  It seems that
> increasing the number of discriminators doesn't really help either.  Too
> bad because otherwise you could flag those messages for human
> classification.

I think it's worse than just that:  suppose any scheme says "OK, this is
spam, with probability 0.9995".  If it's reporting accurate probabilities,
then another way to read that claim is "On average, one time in 2000 this
message actually isn't spam".  In real life we have to accept that there's
no scheme with a 0% false positive rate-- not even human review --short of
the scheme that never calls anything spam.  Since deciding on the largest
acceptable false positive rate is far more a social than a technical issue,
a group of nerds will do anything rather than face it <wink>.