[Spambayes] Slice o' life

Tim Peters tim.one@comcast.net
Wed Oct 16 19:49:03 2002


[Rob W.W. Hooft]
> Correlations, correlations, correlations. It all boils down to
> correlations. Not the fact that there are correlations, but that they
> are very, very different from one clue to the next. All these mailman
> clues are correlated. And by not downweighting them, we're blinding the
> procedure to the other clues that do not come by the dozens...

It's not even that they're Mailman clues, though, it's more that python.org
specifically already has strong anti-spam and anti-virus measures in place.
That's how these "Mailman clues" earned their very low spamprobs to begin
with -- it's not that Mailman is stopping spam, it's that virtually all the
Mailman lists I'm on go through python.org.  So when python.org screws up,
there's little anything can do on the user's end, short of ignoring
python.org clues as evidence.  I don't know how to automate that in a
no-brainer cross-user way (and, no, I still don't think 200K x 200K matrix
analysis is tractable for this <wink>).

So far as python.org goes, I expect it will eventually use the code
developed here, and its false negative rate should go down then (I haven't
yet seen a spam approved by python.org that *this* code scores low when the
python.org header clues are ignored).