[Spambayes] RE: Do you need to continue training ham?

Coe, Bob rcoe at CambridgeMA.GOV
Fri Sep 19 13:07:23 EDT 2003


So to emphasize the "hamminess" of a group of messages (e.g., those from a given correspondent or sent to a given list), you could move them to your "Ambiguous" folder and then click "Recover from Spam". Right? Would that approach help those who have complained of a large number of false positives without forcing them into a complete retrain?

Bob

MIS Department, City of Cambridge
831 Massachusetts Ave, Cambridge MA 02139  ·  617-349-4217  ·  fax 617-349-6165


> -----Original Message-----
> From: Tim Peters [mailto:tim.one at comcast.net]
> Sent: Thursday, September 18, 2003 7:38 PM
> To: Rob Rosenfeld; spambayes at python.org
> Subject: RE: [Spambayes] Do you need to continue training ham?
> 
> 
> [Rob Rosenfeld]
> > Hey folks.  I have moved from SpamAssassin to the SpamBayes Outlook
> > plug-in. The integration is great.  I'm a bit confused about one
> > part.  I had stockpiles of ham and spam to initially train SpamBayes
> > with.
> 
> Note that spambayes works best if you train on an 
> approximately equal number
> of each.  It doesn't take millions <wink>, either.  For 
> example, I started
> this project, and my home Outlook classifier still hasn't 
> been trained on
> 2000 messages total (I get about 600 per day, and my 
> classifier database is
> going on one year old, so I've trained on less than 1% of the 
> email I've
> received in that time).
> 
> > If I understand correctly, every time SpamBayes detects and
> > moves a spam, it trains on it, kind of giving it "ongoing" spam
> > training.   Is that correct?
> 
> Nope.
> 
> > If it doesn't move it as spam, does it train on it as ham?
> 
> Not that either.  It auto-trains on messages for which you 
> explicitly click
> the "Recover from Spam" or "Delete as Spam" buttons.  In 
> addition, it *may*
> train on messages *you* move to spam or ham folders, 
> depending on which
> boxes you've checked in the spambayes Manager's Training tab, section
> "Incremental Training".  These aren't necessarily ideal 
> training protocols,
> but they're the best we've been able to implement so far that 
> most users
> seem able to deal with.  Ideal would be to train on a small 
> random sample of
> all the email you get, and expire training messages over time 
> too.  That
> seems hard.
> 
> 
> _______________________________________________
> Spambayes at python.org
> http://mail.python.org/mailman/listinfo/spambayes
> Check the FAQ before asking: http://spambayes.sf.net/faq.html
> 



More information about the Spambayes mailing list