[Spambayes] Moving closer to Gary's ideal

Guido van Rossum guido@python.org
Mon, 23 Sep 2002 09:46:56 -0400


> One possibility is to use the spam half of the corpus I gathered last
> week.

How many msgs by now?

> You'll still have the "multiple sources" problem after a fashion,
> because the "Received" headers for the spam I gathered stop at
> mail.python.org, but the ham from your inbox winds its way through your
> ISP to your inbox.
> 
> Maybe we need a tool to strip off "Received" headers up to a
> certain point -- eg. if Guido wants to train spambayes for his traffic,
> but run it at the python.org MTA, then it shouldn't train on the
> Received headers added by his ISP.  Or maybe there should be an option
> to ignore "Received" headers up to server X.

Neat.  I'll think about it when I get my hands on your spam corpus.

--Guido van Rossum (home page: http://www.python.org/~guido/)