[Spambayes] mining dates?

Tim Peters tim.one@comcast.net
Mon, 30 Sep 2002 22:16:59 -0400


[Skip Montanaro]
> ...
> It didn't prove my hypothesis, but may have exposed something as useful.
> Spam seems to be sent at a fairly constant rate throughout the day,
> which stands to reason, since it's probably all sent automatically.
> However, ham definitely seems to be sent predominantly during waking
> hours (doh!).  I'm going to give a little date mining a try.

You have my encouragement, but are you talking about date mining or time
mining?  Date mining has hurt lots of folks, by giving good results for
bogus reasons ("oops!  that whole ham archive came from 1998, and none of my
spam does").  So I suggest you *almost* stick to just time-of-day for now.
Two extensions:

1. Day of week may also be interesting.  I keep a hotmail account
   alive just to watch the spam pour in, and it definitely gets
   more spam on weekends.  I speculate that the last 500 people
   to buy a CD of email addresses can't make time until the
   weekend to become an instant internet millionaire <wink>.

2. Greg Ward suggested two Date things SpamAssassin looks for:

SPAM: *  1.6 -- Invalid Date: header (not RFC 2822)
SPAM: *  2.7 -- Date: is 24 to 48 hours before Received: date

If, OTOH, we were trying to distinguish email from Guido from the rest of
our email, a great clue would be whether it came from Guido, but an even
better one is whether his reply was sent before the original <wink>.