[spambayes-dev] Anybody still have a test ham/spam database?

Erik M. Brown kirebrow at yahoo.com
Wed Jul 11 11:38:29 EDT 2018


I look forward to this porting development Skip, thank you!  = )  I still
use SB with Windows 7 Pro and 2010 Outlook.  

Current stats below:

Database has 2047 good and 4629 spam.
Messages classified: 182,390
Good: 55,249 (30.3%)
Spam: 123,777 (67.9%)
Unsure: 3364 (1.8%)

6 false positives...LOL!  Incredible.....

Please let me know how I can help as well, regarding testing in various
environments.

Take care,

Erik


-----Original Message-----
From: spambayes-dev
[mailto:spambayes-dev-bounces+kirebrow=yahoo.com at python.org] On Behalf Of
Skip Montanaro
Sent: Tuesday, July 10, 2018 8:35 PM
To: Tim Peters
Cc: spambayes-dev at python.org
Subject: Re: [spambayes-dev] Anybody still have a test ham/spam database?

> Sorry, Skip - I don't.  And I was surprised just now to see that we
apparently never checked test data files into the Sourceforge source tree
either!
>
> But it shouldn't matter.  SB learns pretty quickly, and it would be better
to use _current_ examples of spam and ham anyway (their characteristics
change over time).

Sure, but constructing a suitable ham/spam corpus from scratch is a
non-trivial task, as you no doubt remember. I could start with the
collection on mail.python.org, but I suspect I would probably let a personal
email or three leak through into what's ostensibly a public database.
(SpamBayes has been doing a pretty good job over the years at its original
assigned task.) I am looking to insure that a Py3 port of SpamBayes works
the same as the Py2 code.

Skip
_______________________________________________
spambayes-dev mailing list
spambayes-dev at python.org
https://mail.python.org/mailman/listinfo/spambayes-dev



More information about the spambayes-dev mailing list