[Spambayes] Serious Database Corruption Problems

jacob-spambayes-list at statisticalanomaly.com jacob-spambayes-list at statisticalanomaly.com
Wed Oct 15 23:56:05 EDT 2003


>> I'm getting a little frusturated with this.  Is there
>> something I can do to keep this from happening?
>
> Do you do all your training with "sb_imapfilter.py -t"?  Up until the
> assertion error, does the training always successfully complete?  (i.e.
> it doesn't crash halfway through?)

Yes, I do all of my training that way.  The training always completes, and
then the program fails during classification.  I've included a typical
transcript below.  Something worth making note of: it seems like, many
times during training, it'll report that messages are trained when there
are no new messages in that particular folder.

>
> If you run db_expimp.py on your database to convert it to text
> ("db_expimp.py -e -d hammie.db -f hammie.txt" if it's a pickle) and open
> it
> up, what are the ham and spam counts at the top?  (I suspect 0 for both).

suslik% more hammie.txt
311,431,

I can send you the whole file if it'd be useful.

Thanks,
Jacob

-----------------------------------------------------------------------
suslik% ./sb_imapfilter.py -l 5 -c -t -v -d hammie.db
SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading state from hammie.db pickle
hammie.db is an existing pickle, with 310 ham and 417 spam
Loading database hammie.db... Done.
Training
   Training ham folder INBOX.-Wanted
.........................................................................................................................................................................................................................................................................................................
      0 trained.
   Training ham folder INBOX
.*............       1 trained.
   Training spam folder INBOX.-Spam
*................................................................................................................................................................................................................................................................................................................................................................................................................................**************
      15 trained.
Persisting hammie.db as a pickle
Training took 35.0596210957 seconds, 16 messages were trained
Classifying
...................
Classified 0 ham, 0 spam, and 0 unsure.
Classifying took 0.656105995178 seconds.
Training
   Training ham folder INBOX.-Wanted
.........................................................................................................................................................................................................................................................................................................
      0 trained.
   Training ham folder INBOX
.*............       1 trained.
   Training spam folder INBOX.-Spam
*..............................................................................................................................................................................................................................................................................................................................................................................................................................................
      1 trained.
Persisting hammie.db as a pickle
Training took 29.7854119539 seconds, 2 messages were trained
Classifying
..................*.Traceback (most recent call last):
  File "./sb_imapfilter.py", line 824, in ?
    run()
  File "./sb_imapfilter.py", line 814, in run
    imap_filter.Filter()
  File "./sb_imapfilter.py", line 675, in Filter
    self.unsure_folder)
  File "./sb_imapfilter.py", line 594, in Filter
    evidence=True)
  File "/u/jpfarmer/lib/python2.3/site-packages/spambayes/classifier.py",
line 158, in chi2_spamprob
    clues = self._getclues(wordstream)
  File "/u/jpfarmer/lib/python2.3/site-packages/spambayes/classifier.py",
line 395, in _getclues
    prob = self.probability(record)
  File "/u/jpfarmer/lib/python2.3/site-packages/spambayes/classifier.py",
line 245, in probability
    assert spamcount <= nspam
AssertionError




More information about the Spambayes mailing list