[spambayes-bugs] [ spambayes-Bugs-884545 ] off-by-one ?

SourceForge.net noreply at sourceforge.net
Sat Feb 21 20:46:41 EST 2004


Bugs item #884545, was opened at 2004-01-26 17:17
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=884545&group_id=61702

Category: imapfilter
Group: Source code 1.0a7
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Bip (bippo312)
Assigned to: Tony Meyer (anadelonbrin)
Summary: off-by-one ?

Initial Comment:
WinXP, python 2.3.3

Whenever I train spambayes, I always (later, when 
classifying) get the error that I have more spam than 
total mails.  I normally just do sb_dbexport -e -D 
hammie.db -f out.txt

In out.txt, the top line will say that there are 202 ham 
and 3342 spam...  searching through the file I find the 
line

header%3ATo%3A1`170`3343`

(this is the only line in the db where the # spams is 
larger than the total # of spams)

I correct this line, and re-import the database.  It now 
works fine.  Every time I train spambayes this 'header-
to' line will always be one higher than the total # of 
spams.

(If I train twice, the number of spams for header-to will 
be 2 larger than the max # of spams... etc)

----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2004-02-22 14:46

Message:
Logged In: YES 
user_id=552329

Fantastic :)

----------------------------------------------------------------------

Comment By: Bip (bippo312)
Date: 2004-02-22 06:13

Message:
Logged In: YES 
user_id=895052

Seems like it's fixed in 1.0a9, Thanks :)

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-02-17 15:58

Message:
Logged In: YES 
user_id=552329

I still can't figure what could be causing this :(

Could you try the 1.0a9 (0.9) source release, and see if the 
problem has been fixed? (a number of imapfilter bugs has, and 
it's possible that this was a side effect of one of those).

----------------------------------------------------------------------

Comment By: Bip (bippo312)
Date: 2004-02-02 18:30

Message:
Logged In: YES 
user_id=895052

The 200 hams have accumulated over 2 years, the 3000+ 
spams are from the last 4 months :)

Yes, I am only using imapfilter.

I emptied my spam folder except for 1 untrained spam before 
training. (I originally had 3590 spams, but WinXP only allows 
9999 lines of scrollback, when about 30k lines were needed... 
and appending  '>> out.txt' didn't do anything useful)

Just before training:
Nothing new in ham folder, 1 new spam in spam folder.

During training:
See file.


Originally I had 3590 spam.  The total # of spam recorded in 
my database was 3590 and I had set the # of spam in the 
header-to entry to 3590 as well.

After the training:
The total # of spam in the database was still 3590 (should 
have been 3591??) and the # of header-to spam was listed 
as 3591... (correct, I think)


----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-01-29 13:46

Message:
Logged In: YES 
user_id=552329

1.  That's quite an imbalance - you'd probably get better 
results with more even numbers of ham and spam.

What are you using to train spambayes?  just imapfilter?  
Could you attach the output from running imapfilter with the -
i4 switch?  (blank out the username/password stuff).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=884545&group_id=61702



More information about the Spambayes-bugs mailing list