[spambayes-bugs] [ spambayes-Bugs-800392 ] Filtered "known-spam"
emails don't get added to database
SourceForge.net
noreply at sourceforge.net
Thu Sep 4 05:40:25 EDT 2003
Bugs item #800392, was opened at 2003-09-04 11:36
Message generated for change (Comment added) made by grab_rat
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=800392&group_id=61702
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Graham Bartlett (grab_rat)
Assigned to: Nobody/Anonymous (nobody)
Summary: Filtered "known-spam" emails don't get added to database
Initial Comment:
I don't know if this is a bug or a "feature" - it might
belong in RFEs. Anyway...
When an email gets recognised by the filter as spam, it
gets moved to the "known-spam" folder. However the
filter does not seem to train on this email as spam. I
don't know why the filter doesn't train on emails it
moves itself, when it *does* train on email that I move
manually. This has two main effects.
Firstly, the filter will not "reinforce" itself against words
which are almost certainly spam. For instance, the
word "girls" is only scored 0 ham, 2 spam, when in fact
the word would be very unlikely to come up in my emails
but makes a regular appearance in my spam. This
means that some words get scored abnormally low. I
don't use an "undecided" folder so I don't know how well
the filter detects "known-ham" emails, but I would guess
it would have a similar problem on scoring ham emails.
Secondly, the filter will not detect new words appearing
in spams. If an email is detected as spam, all words
appearing in it should be trained on, otherwise when
spams come in featuring the new words alone, they will
not be recognised as such. A classic example here
would be the emails selling diet supplements - if I train
my filter to see "glucosamine" and "vitamin" as spam, and
then I receive a spam featuring those two words and
also "echinacea", the filter should learn that "echinacea"
is also likely to be connected to spams. When I next
get an email selling only echinacea, it'll then be correctly
detected as spam.
----------------------------------------------------------------------
>Comment By: Graham Bartlett (grab_rat)
Date: 2003-09-04 11:40
Message:
Logged In: YES
user_id=633868
One comment to follow up. I know it's possible to collect a
bunch of spam in the "known-spam" folder and then train on it
all. However (a) this is inconvenient; (b) if it's known that
this needs doing then you may as well do it as soon as the
spam is detected; and (c) this will count spams twice if
they've made it through the filter and been moved manually
by me, instead of being moved automatically by the filter.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=800392&group_id=61702
More information about the Spambayes-bugs
mailing list