[spambayes-bugs] [ spambayes-Bugs-884545 ] off-by-one ?
SourceForge.net
noreply at sourceforge.net
Sat Feb 21 20:46:41 EST 2004
Bugs item #884545, was opened at 2004-01-26 17:17
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=884545&group_id=61702
Category: imapfilter
Group: Source code 1.0a7
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Bip (bippo312)
Assigned to: Tony Meyer (anadelonbrin)
Summary: off-by-one ?
Initial Comment:
WinXP, python 2.3.3
Whenever I train spambayes, I always (later, when
classifying) get the error that I have more spam than
total mails. I normally just do sb_dbexport -e -D
hammie.db -f out.txt
In out.txt, the top line will say that there are 202 ham
and 3342 spam... searching through the file I find the
line
header%3ATo%3A1`170`3343`
(this is the only line in the db where the # spams is
larger than the total # of spams)
I correct this line, and re-import the database. It now
works fine. Every time I train spambayes this 'header-
to' line will always be one higher than the total # of
spams.
(If I train twice, the number of spams for header-to will
be 2 larger than the max # of spams... etc)
----------------------------------------------------------------------
>Comment By: Tony Meyer (anadelonbrin)
Date: 2004-02-22 14:46
Message:
Logged In: YES
user_id=552329
Fantastic :)
----------------------------------------------------------------------
Comment By: Bip (bippo312)
Date: 2004-02-22 06:13
Message:
Logged In: YES
user_id=895052
Seems like it's fixed in 1.0a9, Thanks :)
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2004-02-17 15:58
Message:
Logged In: YES
user_id=552329
I still can't figure what could be causing this :(
Could you try the 1.0a9 (0.9) source release, and see if the
problem has been fixed? (a number of imapfilter bugs has, and
it's possible that this was a side effect of one of those).
----------------------------------------------------------------------
Comment By: Bip (bippo312)
Date: 2004-02-02 18:30
Message:
Logged In: YES
user_id=895052
The 200 hams have accumulated over 2 years, the 3000+
spams are from the last 4 months :)
Yes, I am only using imapfilter.
I emptied my spam folder except for 1 untrained spam before
training. (I originally had 3590 spams, but WinXP only allows
9999 lines of scrollback, when about 30k lines were needed...
and appending '>> out.txt' didn't do anything useful)
Just before training:
Nothing new in ham folder, 1 new spam in spam folder.
During training:
See file.
Originally I had 3590 spam. The total # of spam recorded in
my database was 3590 and I had set the # of spam in the
header-to entry to 3590 as well.
After the training:
The total # of spam in the database was still 3590 (should
have been 3591??) and the # of header-to spam was listed
as 3591... (correct, I think)
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2004-01-29 13:46
Message:
Logged In: YES
user_id=552329
1. That's quite an imbalance - you'd probably get better
results with more even numbers of ham and spam.
What are you using to train spambayes? just imapfilter?
Could you attach the output from running imapfilter with the -
i4 switch? (blank out the username/password stuff).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=884545&group_id=61702
More information about the Spambayes-bugs
mailing list