[spambayes-bugs] [ spambayes-Feature Requests-943116 ] White list for domains/email addresses

SourceForge.net noreply at sourceforge.net
Thu Apr 29 18:04:22 EDT 2004


Feature Requests item #943116, was opened at 2004-04-28 04:04
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=943116&group_id=61702

Category: pop3proxy
Group: None
Status: Open
Priority: 5
Submitted By: DarkLaser (darklaser)
Assigned to: Nobody/Anonymous (nobody)
Summary: White list for domains/email addresses

Initial Comment:
      A nice feature would be to have a domain/email 
address white list where you could specify email 
addresses which should be marked as ham without 
regard to content.  It would also be nice to be able to 
say anything from the domain belonging to the company 
I work for should also be marked as ham regardless of 
content.

Anyway, my 2bits.

Thanks,
David

----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2004-04-30 10:04

Message:
Logged In: YES 
user_id=552329

Oh, one other thing - is there any reason that you can't
just use your mail client (Outlook Express?) 's rules to
implement whitelisting yourself?  It's certainly the simple
solution and works for most people.

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-04-30 10:03

Message:
Logged In: YES 
user_id=552329

As the FAQ says, we realise that some people want
whitelisting.  However, none of the developers do, and none
have any interest in putting in the considerable effort into
developing whitelisting capabilities.  As such, this simply
won't be added until someone comes along with code in hand.
 The developers aren't refusing to add it in (as long as it
is off by default) but there just isn't any incentive for us
to add it.  Anyone desperate for the functionality always
has the options to (a) write code or (b) pay for a product
that does have whitelisting (InBoxer, for example).

In any case, as Kenny said, 85% false positives is
unbelievably poor.  You'd be better off not using spambayes
at all!  The fp rate should be less than 5% - typically
around 1% or lower.  Something is clearly wrong with your
training.

I'd still rather have this closed - if I thought that people
would actually see it and not open a new request that would
be different, but that doesn't happen.  If it says open then
we have two places (here and the FAQ) where information
collects.  Whitelisting is brought up so often that a
tracker really isn't necessary, IMO.

----------------------------------------------------------------------

Comment By: DarkLaser (darklaser)
Date: 2004-04-30 07:31

Message:
Logged In: YES 
user_id=1030399

Yes I had stored up spam from the last 8 months incase I 
came accross a bayesien filter I wanted to train it with, but it 
all came from this one account.  The near 5,000 valid email 
are all my valid email for this account over the last 5 years.  
So it should have worked very well I would have thought.

Perhaps SpamBayes learns from initial tranning sets differently 
than from email it processes as it comes in.  So perhaps the 
solution is to remove the past training and just start training 
from scratch.  

Yes, I'll wait for a few more false positives, and I'll post them 
before wiping it out and starting over.

David

----------------------------------------------------------------------

Comment By: Kenny Pitt (kpitt)
Date: 2004-04-30 06:49

Message:
Logged In: YES 
user_id=859086

It's very unusual to hear from someone who is getting 
accuracy this poor.  Could you upload a copy of the spam 
clues for a false positive message (before training on it)?  
Seeing why SpamBayes thought the way it did when it first 
processed the message would help a lot.

I notice that you have far more training data than you have 
messages that have been processed by SpamBayes, so I 
assume you had a large initial training set.  Is it possible that 
your training data was not representative of the messages 
that you are currently receiving?

Although there is no proven best training strategy, in general 
SpamBayes seems to perform best if you initially train with 
only 5 or 10 of each type of message and then train it up on 
your current message stream instead of training it on lots of 
outdated messages.  You'll also find that SpamBayes is more 
responsive to training of new messages when you have fewer 
messages in the training database.  With the large number of 
messages that you have, it will take a *LOT* of training to 
overcome existing clues.

----------------------------------------------------------------------

Comment By: DarkLaser (darklaser)
Date: 2004-04-30 01:46

Message:
Logged In: YES 
user_id=1030399

Anadelonbrin, thanks for the url.  I had looked for something 
about white lists, but couldn’t find it.  

I maintain that a white list would be useful.  Perhaps not to 
some, but very much so for others.  I have received maybe 1 
or 2 spam claiming to be from someone on the domain for the 
company I work for in the last year, and never have received 
any claiming to be from any of my 4 personal domains.  
However, the current false positive ratio is horrible.  Try 85% 
of my good email is falsely being marked as spam.  Look at 
the number of emails I have trained, with that many trained, I 
should be getting near perfect results.
-------------------------------------------------
Total emails trained: Spam: 9728 Ham: 4939
SpamBayes has processed 546 messages - 4 (1%) good, 538 
(99%) spam and 4 (0%) unsure.
29 messages were manually classified as good (23 were false 
positives).
517 messages were manually classified as spam (0 were false 
negatives).
2 unsure messages were manually identified as good, and 2 as 
spam.
-------------------------------------------------
Ignoring the unsure messages, out of 27 good emails, 4 were 
actually marked as good and 23 as spam.  That is ridiculous.  
Perhaps I need to change something in my settings, but the 
majority of those good emails are from this one domain, so in 
my case a white list would make a world of difference.  If one 
or two spam a year get through because of a white list, no 
biggie, I can handle that.  That’s a lot easier than having to 
go manually remove the word 'spam,' from the subject of 85% 
of my email.  

I don’t have any experience with python (I’m a perl man 
myself), otherwise I would look at building a white list to send 
to the project manager.  Anyway, I still think this item should 
remain on the wish list.

Thanks,
David

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-04-29 13:45

Message:
Logged In: YES 
user_id=552329

Please see FAQ 6.6:

<http://spambayes.org/faq.html#why-don-t-you-add-whitelisting-blacklisting-to-spambayes>

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=943116&group_id=61702



More information about the Spambayes-bugs mailing list