[Spambayes] more information please
Tony Meyer
tameyer at ihug.co.nz
Sat Jun 3 11:31:46 CEST 2006
> I have been leaving the category as SpamBayes set it for messages
> it had correctly identified, so presumably have been "re-training"
> it on ones it already got right. I thought this was the correct
> way, confirming that SB was right in those instances, or does it
> mean that a bias of any sort could develop?
It's not 100% clear what the best training regime is. Simulations so
far, as well as anecdotal evidence, have shown that a 'mistake-based'
training regime is probably best. (For example, only training on
false positives, false negatives and unsures, alternatively, training
only on 'nonedge' messages (e.g. scoring between 10% and 90%)).
One reason these probably work better is that the databases end up
smaller, which means that if 'random' real words are added to a spam,
it is less likely that they are in your database (which means they
are ignored).
> It would be good if clear instructions similar to the above were
> included in the interface page below the list of mails processed so
> it's there for easy reference.
If you click on the "Help" icon at the bottom of the page, it says
pretty much what I did in the email, and has a link to the wiki where
training options are discussed in more detail (since there isn't a
definitive answer about what is best, it's hard to have a concise
summary distributed with the software). If you can think of ways
that the help text could be improved, please let us know (IIRC I
simply wrote what I thought of at the time, and it hasn't been
reviewed since).
=Tony.Meyer
--
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
More information about the SpamBayes
mailing list