[Spambayes] Training on unusual ham - revisited

Seth Goodman sethg at GoodmanAssociates.com
Thu Feb 9 00:43:32 CET 2006


On Thursday, February 02, 2006 10:35 PM -0600, Bob Posert wrote:

> Back in
>  http://mail.python.org/pipermail/spambayes/2006-January/018702.html
>  , Tim Peters and I had a dialog about training on unusual ham -
> monthly messages from http://www.boldtype.com.  I just got another
> one and it scored 50% on the spam scale.  The clues follow - I'd
> really appreciate any help. Thanks, Bob
>
>  Combined Score: 50% (0.5) Internal ham score (*H*):  1
>  Internal spam score (*S*): 1
>
>  # ham trained on: 1229
>  #  spam trained on: 20331

Something else worth mentioning is the large total number of messages in
the training set.  While there isn't much evidence that I'm aware of
that says this harms accuracy, most people are able to get very good
results with a few hundred to a few thousand trained messages.  Some
have reported good results with on the order of 50 of each type.  If
nothing else, this makes the databases very large.

--
Seth Goodman



More information about the SpamBayes mailing list