[spambayes-dev] Anybody still have a test ham/spam database?

Skip Montanaro skip.montanaro at gmail.com
Tue Jul 10 20:34:36 EDT 2018


> Sorry, Skip - I don't.  And I was surprised just now to see that we apparently never checked test data files into the Sourceforge source tree either!
>
> But it shouldn't matter.  SB learns pretty quickly, and it would be better to use _current_ examples of spam and ham anyway (their characteristics change over time).

Sure, but constructing a suitable ham/spam corpus from scratch is a
non-trivial task, as you no doubt remember. I could start with the
collection on mail.python.org, but I suspect I would probably let a
personal email or three leak through into what's ostensibly a public
database. (SpamBayes has been doing a pretty good job over the years
at its original assigned task.) I am looking to insure that a Py3 port
of SpamBayes works the same as the Py2 code.

Skip


More information about the spambayes-dev mailing list