[Spambayes] Spam Database available?

Tony Meyer tameyer at ihug.co.nz
Thu Jun 30 03:36:39 CEST 2005


> My question is there available a collection of recent SPAM emails 
> available that I can access to train SpamBayes?

You can get spam from <http://spamarchive.org> and the SpamAssassin Public
Archive <http://spamassassin.apache.org/publiccorpus/>.

Generally this isn't a good idea, though.  You should try to just train on
spam that you get, as that will look most like any future spam that you get.
I agree with Jesse that doing train-on-error (including unsures) is the best
way to go.

<http://entrian.com/sbwiki/TrainingIdeas> has much more about this.

> The problems started when it would classify obvious "SPAM" as 
> "HAM" but when you went to correct it, the email was not in the
> review screen,

If mail doesn't appear in the review page (and you haven't changed any of
the caching options, and it's less than 7 days ago), then it most likely
*wasn't* classified as ham, it probably failed to be classified.  Check the
message for a "X-Spambayes-Exception" header.  If there is one, then it
indicates that something went wrong while trying to classify.  If you can't
figure out what the problem is (the FAQ may help), then ask here, including
the content of the exception header.

> With 1.04 I already have over 100 SPAM emails that it has 
> received and still all it says is that it's "unsure" it
> almost never categorizes anything as "SPAM" anymore.

See FAQ 4.7:
<http://spambayes.org/faq.html#why-did-spambayes-mark-this-obvious-spam-unsu
re>.  The most likely cause is a database imbalance, but we need to see
clues to know for sure.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. 



More information about the Spambayes mailing list