[Spambayes] train-to-exhaustion questions
David Abrahams
dave at boost-consulting.com
Fri Apr 27 21:27:51 CEST 2007
on Fri Apr 27 2007, skip-AT-pobox.com wrote:
> Dave> Good to know. I suppose there's no reason not to do this with a
> Dave> cron job, if you're that confident in it.
>
> Well, I do kill my fetchmail process so sb_bnfilter isn't trying to read the
> database while tte is trying to write it.
IIUC, tte builds a fresh DB each time. I simply run it on a new file
and the copy it over the old file.
> Dave> OK... but what will happen if the real ratio of ham to spam is
> Dave> more like 412:379 and I pass a simple ratio of 3:2?
>
> All the RATIO tells it is how many spams and hams to score in one shot. 3:2
> means (if I recall correctly) that it will pick the next three spams and
> next two hams to score. It will then check their scores. Any which are
> correctly scored won't be visited in the next round. I believe those that
> are scored incorrectly will be used to update the training database at that
> point.
Yeah, that's what I thought.
> Dave> I guess I'm saying that the ratio argument is good for training
> Dave> some specific ratio of hams and spams... but does anyone really
> Dave> want to train a specific ratio? What's the use case? If you've
> Dave> supplied the ratio argument to make it easy for people to train
> Dave> everything in an unbalanced set, it's not a very good way of
> Dave> getting there.
>
> Maybe, but it works for me.
Sounds like you don't really want to discuss this point? Sorry, I
don't mean to press it. Just tell me whether you understand my
argument and would be willing to accept a patch for an --unbalanced
argument that counts the messages and traces a nice quantized line
through the whole set.
> Dave> Unfortunately, I want to keep my email address and my server, so
> Dave> unless Google is going to make their spam blocking technology
> Dave> public it means SB is going to have to take on the whole job.
>
> I still use skip at pobox.com as my visible identity. You could have
> dave at boost-consulting.com forward to Gmail and then use POP3 to pick up your
> mail from their server. Then use your normal email client and use your
> boost-consulting address. You would lose the IMAP capability,
Not good for me. I really need what IMAP offers, AFAICT.
--
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com
Don't Miss BoostCon 2007! ==> http://www.boostcon.com
More information about the SpamBayes
mailing list