[Spambayes] incremental training strategies
Skip Montanaro
skip@pobox.com
Mon Oct 28 18:42:08 2002
Alex> Speaking from a theoretical purity standpoint, I suspect that
Alex> training it on everything that came through would be
Alex> 'cleaner'... but I have no idea if in practise it would work any
Alex> better than just training on the mistakes and unsure.
Yeah, but theory and practice often disagree. ;-) The biggest problem I see
in training it on every message you encounter is you are likely to make
mistakes, generally of the inattentiveness or fumble-fingered variety.
That's fine when you're testing the algorithm. You migrate the message to
the other pool, then test again. It's a bit different proposition if you
are training messages on-the-fly, then delete them (or even if you don't
delete them). How do you realize you misclassified a message? If you
realize you misclassified a message, how do you undo the effect of the
misclassification, particularly if you no longer have the message laying
around?
>From the standpoint of minimizing human error, once you have a decent
hammie.db file, it seems to me that only training on either unsure or
incorrect messages is likely to be the best way to improve it.
Skip