[Spambayes] Bayes Training

Thu Nov 14 00:06:33 2002

11/13/2002 5:50:59 PM, "T. Alexander Popiel" <popiel@wolfskeep.com> wrote:

>In message:  <952VONFAZVMLSQCBSMHCB97C995ROUP.3dd2e1aa@riven>
>             <tim@fourstonesExpressions.com> writes:
>>It occurs to me that perhaps *outgoing* mail might be a source of ham 
>>training.  With the presence of the smtp proxy, we *could* train the 
database 
>>on mail that a user sends, presuming that mail that looks like mail that a 
>>person sends is unlikely to be spam...
>
>Not so good, if we're parsing From addresses... one common spammer
>tactic is to make the mail appear to be coming from yourself.
>Training on a lot of data coming from the user would eliminate
>that as a spam clue...
>

Yeah, parsing on from: would be a problem, but the smtpproxy could easily 
strip the from header out, or all the headers for that matter, before sending 
it for training.  It seems very likely to me that the words I use in my mail 
are those that I would tend to want my database to weigh in the favor of 
ham...

>In any case, given the ham:spam ratios recently bandied about,
>I don't think there's really a problem finding sufficient ham
>from other sources. ;-)

I'm not completely convinced that the ham:spam that we're discussing are 
reflective of the average email user.  I think people commonly experience 1:15 
or 1:20 ratios... perhaps even more... we've been discussing much lower ratios 
if I recall correctly...

- TimS
>
>- Alex
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>
- Tim
www.fourstonesExpressions.com