[Spambayes] Bayes Training
Tim Stone - Four Stones Expressions
tim@fourstonesExpressions.com
Thu Nov 14 00:06:33 2002
11/13/2002 5:50:59 PM, "T. Alexander Popiel" <popiel@wolfskeep.com> wrote:
>In message: <952VONFAZVMLSQCBSMHCB97C995ROUP.3dd2e1aa@riven>
> <tim@fourstonesExpressions.com> writes:
>>It occurs to me that perhaps *outgoing* mail might be a source of ham
>>training. With the presence of the smtp proxy, we *could* train the
database
>>on mail that a user sends, presuming that mail that looks like mail that a
>>person sends is unlikely to be spam...
>
>Not so good, if we're parsing From addresses... one common spammer
>tactic is to make the mail appear to be coming from yourself.
>Training on a lot of data coming from the user would eliminate
>that as a spam clue...
>
Yeah, parsing on from: would be a problem, but the smtpproxy could easily
strip the from header out, or all the headers for that matter, before sending
it for training. It seems very likely to me that the words I use in my mail
are those that I would tend to want my database to weigh in the favor of
ham...
>In any case, given the ham:spam ratios recently bandied about,
>I don't think there's really a problem finding sufficient ham
>from other sources. ;-)
I'm not completely convinced that the ham:spam that we're discussing are
reflective of the average email user. I think people commonly experience 1:15
or 1:20 ratios... perhaps even more... we've been discussing much lower ratios
if I recall correctly...
- TimS
>
>- Alex
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>
- Tim
www.fourstonesExpressions.com
More information about the Spambayes
mailing list