[Spambayes] Spam Clues ????? ??????
Amedee Van Gasse
amedee at amedee.be
Mon Apr 28 14:33:29 CEST 2008
On Wed, April 16, 2008 19:34, David wrote:
>
> Am getting loads of spam with cyrillic characters and would like to know
> if
> Spambayes can automatically delete anything with these characters in their
> headers. Below is score info for typical one. If you need it, could send
> you the config file if you can tell me where to find it.
>
> Kindest regards
> David Kanareck
>
>
>
>
>
> Combined Score: 57% (0.567348)
>
> Internal ham score (*H*): 0.285187
> Internal spam score (*S*): 0.419882
>
> # ham trained on: 39
> # spam trained on: 76
That is not much training. In my experience, Spambayes gets *extremely*
accurate after about 100 hams and 100 spams. Your mileage may vary.
With the Outlook plugin, I add a column that shows the spam score (see
FAQ/wiki for details). I sort on spam score. I look at the bottom and find
one spam with the lowest score. Train as spam. Rescore inbox. Now I look
at the top, and find one ham with the highest score. Train as ham,
rescore. Back to the lowest spam, rescore. Highest ham, rescore. Lather,
rince, repeat. Very quickly you will see that all spam scores above 99%
and all ham scores below 1%.
This method of training is so kewl that I have actually considered
installing Outlook on Linux, just so that I could train Spambayes this
way.
> 'message.' 0.310872 15 13
>
> 'date:' 0.325631 14 13
>
> 'checked' 0.341867 13 13
>
> 'database:' 0.341867 13 13
>
> 'incoming' 0.341867 13 13
>
> 'version:' 0.341867 13 13
>
> 'virus' 0.35698 14 15
>
> 'release' 0.358294 13 14
>
> 'avg.' 0.359817 12 13
>
> 'skip:2 10' 0.359817 12 13
>
> 'found' 0.385564 14 17
These are generic tokens added by your virus scanner. After more training
they will score around .5 which means they will neither increase nor
decrease the global spam score of a message.
> 'to:no real name:2**0' 0.750084 10 59
>
> 'header:Received:1' 0.893006 1 18
Interesting tokens...
> 'from:charset:koi8-r' 0.908163 0 2
>
> 'subjectcharset:koi8-r' 0.908163 0 2
And those last two are *really* interesting tokens!
Keep on training, I can already see that your Spambayes is improving.
--
Amedee Van Gasse
amedee at amedee.be
Disclaimer:
By sending an email to ANY of my addresses you are agreeing that:
1. I am by definition, "the intended recipient"
2. All information in the email is mine to do with as I see fit and
make such financial profit, political mileage, or good joke as it lends
itself to. In particular, I may quote it on usenet.
3. I may take the contents as representing the views of your company.
4. This overrides any disclaimer or statement of confidentiality that
may be included on your message.
More information about the SpamBayes
mailing list