[Spambayes] train-to-exhaustion questions

David Abrahams dave at boost-consulting.com
Fri Apr 27 21:27:51 CEST 2007


on Fri Apr 27 2007, skip-AT-pobox.com wrote:

>     Dave> Good to know.  I suppose there's no reason not to do this with a
>     Dave> cron job, if you're that confident in it.
>
> Well, I do kill my fetchmail process so sb_bnfilter isn't trying to read the
> database while tte is trying to write it.

IIUC, tte builds a fresh DB each time.  I simply run it on a new file
and the copy it over the old file.

>     Dave> OK... but what will happen if the real ratio of ham to spam is
>     Dave> more like 412:379 and I pass a simple ratio of 3:2?  
>
> All the RATIO tells it is how many spams and hams to score in one shot.  3:2
> means (if I recall correctly) that it will pick the next three spams and
> next two hams to score.  It will then check their scores.  Any which are
> correctly scored won't be visited in the next round.  I believe those that
> are scored incorrectly will be used to update the training database at that
> point.

Yeah, that's what I thought.

>     Dave> I guess I'm saying that the ratio argument is good for training
>     Dave> some specific ratio of hams and spams... but does anyone really
>     Dave> want to train a specific ratio?  What's the use case? If you've
>     Dave> supplied the ratio argument to make it easy for people to train
>     Dave> everything in an unbalanced set, it's not a very good way of
>     Dave> getting there.
>
> Maybe, but it works for me.

Sounds like you don't really want to discuss this point?  Sorry, I
don't mean to press it.  Just tell me whether you understand my
argument and would be willing to accept a patch for an --unbalanced
argument that counts the messages and traces a nice quantized line
through the whole set.  

>     Dave> Unfortunately, I want to keep my email address and my server, so
>     Dave> unless Google is going to make their spam blocking technology
>     Dave> public it means SB is going to have to take on the whole job.
>
> I still use skip at pobox.com as my visible identity.  You could have
> dave at boost-consulting.com forward to Gmail and then use POP3 to pick up your
> mail from their server.  Then use your normal email client and use your
> boost-consulting address.  You would lose the IMAP capability, 

Not good for me.  I really need what IMAP offers, AFAICT.

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

Don't Miss BoostCon 2007! ==> http://www.boostcon.com


More information about the SpamBayes mailing list