[spambayes-bugs] [ spambayes-Bugs-800392 ] Filtered "known-spam" emails don't get added to database

SourceForge.net noreply at sourceforge.net
Thu Sep 4 05:40:25 EDT 2003


Bugs item #800392, was opened at 2003-09-04 11:36
Message generated for change (Comment added) made by grab_rat
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=800392&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Graham Bartlett (grab_rat)
Assigned to: Nobody/Anonymous (nobody)
Summary: Filtered "known-spam" emails don't get added to database

Initial Comment:
I don't know if this is a bug or a "feature" - it might 

belong in RFEs.  Anyway...



When an email gets recognised by the filter as spam, it 

gets moved to the "known-spam" folder.  However the 

filter does not seem to train on this email as spam.  I 

don't know why the filter doesn't train on emails it 

moves itself, when it *does* train on email that I move 

manually. This has two main effects.



Firstly, the filter will not "reinforce" itself against words 

which are almost certainly spam.  For instance, the 

word "girls" is only scored 0 ham, 2 spam, when in fact 

the word would be very unlikely to come up in my emails 

but makes a regular appearance in my spam.  This 

means that some words get scored abnormally low.  I 

don't use an "undecided" folder so I don't know how well 

the filter detects "known-ham" emails, but I would guess 

it would have a similar problem on scoring ham emails.



Secondly, the filter will not detect new words appearing 

in spams.  If an email is detected as spam, all words 

appearing in it should be trained on, otherwise when 

spams come in featuring the new words alone, they will 

not be recognised as such.  A classic example here 

would be the emails selling diet supplements - if I train 

my filter to see "glucosamine" and "vitamin" as spam, and 

then I receive a spam featuring those two words and 

also "echinacea", the filter should learn that "echinacea" 

is also likely to be connected to spams.  When I next 

get an email selling only echinacea, it'll then be correctly 

detected as spam.



----------------------------------------------------------------------

>Comment By: Graham Bartlett (grab_rat)
Date: 2003-09-04 11:40

Message:
Logged In: YES 
user_id=633868

One comment to follow up.  I know it's possible to collect a 

bunch of spam in the "known-spam" folder and then train on it 

all.  However (a) this is inconvenient; (b) if it's known that 

this needs doing then you may as well do it as soon as the 

spam is detected; and (c) this will count spams twice if 

they've made it through the filter and been moved manually 

by me, instead of being moved automatically by the filter.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=800392&group_id=61702



More information about the Spambayes-bugs mailing list