[Spambayes] Moving closer to Gary's ideal

Greg Ward gward@python.net
Mon, 23 Sep 2002 09:45:08 -0400


On 22 September 2002, Guido van Rossum said:
> That's impossible to know in my case.  Almost all of my mail goes
> through the SpamAssassin setup at python.org, which throws all spam
> away.  As a result I see maybe 1 spam for every 50 hams -- but that's
> not the spam/ham ratio seen by the MTA for guido@python.org.

One possibility is to use the spam half of the corpus I gathered last
week.  You'll still have the "multiple sources" problem after a fashion,
because the "Received" headers for the spam I gathered stop at
mail.python.org, but the ham from your inbox winds its way through your
ISP to your inbox.

Maybe we need a tool to strip off "Received" headers up to a
certain point -- eg. if Guido wants to train spambayes for his traffic,
but run it at the python.org MTA, then it shouldn't train on the
Received headers added by his ISP.  Or maybe there should be an option
to ignore "Received" headers up to server X.

        Greg
-- 
Greg Ward <gward@python.net>                         http://www.gerg.ca/
Are we THERE yet?