[Spambayes] Can SpamBayes be improved with Markovian Weighting or Chained Karnaugh Mapping?

Tim Peters tim.peters at gmail.com
Mon Aug 9 02:28:26 CEST 2004


[Bill Yerazunis]
> And a warning- what works nearly perfectly for one spam/nonspam mix may not
> work worth beans for another.
>
> I'm chasing this particular problem with Professor Cormack, and it's not
> trivial to solve... or even to understand.

Indeed, that's why I was so delighted to have a variety of testers,
with very different mixes, volunteer tons of testing work when
SpamBayes first started.  Several "good ideas" that helped on my test
data hurt on theirs, so were abandoned.

Alas, nobody on this project has had time to drive that process for
many months (it's a lot of work), and the "research" part of this
project is dead as a result.  Now we have to look at what you figure
out, then steal it <wink>.

Another oddity we've seen is that some specific types of spam create a
lot more trouble for some SB users than for others.  For example,
Nigerian scams always score near 100% spam for me, but some users have
reported that they can seemingly never get them to score above
"unsure", no matter how often they train on them.  I've been hoping to
look into that for, oh, a year.  The expectation is that "doesn't work
well in a specific case" is easier to analyze than "doesn't work worth
beans period".  The latter is a more *interesting* case, though!


More information about the Spambayes mailing list