[Spambayes] Moving closer to Gary's ideal
Guido van Rossum
guido@python.org
Mon, 23 Sep 2002 09:46:56 -0400
> One possibility is to use the spam half of the corpus I gathered last
> week.
How many msgs by now?
> You'll still have the "multiple sources" problem after a fashion,
> because the "Received" headers for the spam I gathered stop at
> mail.python.org, but the ham from your inbox winds its way through your
> ISP to your inbox.
>
> Maybe we need a tool to strip off "Received" headers up to a
> certain point -- eg. if Guido wants to train spambayes for his traffic,
> but run it at the python.org MTA, then it shouldn't train on the
> Received headers added by his ISP. Or maybe there should be an option
> to ignore "Received" headers up to server X.
Neat. I'll think about it when I get my hands on your spam corpus.
--Guido van Rossum (home page: http://www.python.org/~guido/)