[Spambayes] My adventures with Spambayes...

Don Chance chance at stsci.edu
Wed Aug 13 17:07:52 EDT 2003

>> Next, I tried training the filter, but kept getting:
>> imaplib.error: APPEND command error: BAD ['Invalid date-time 
>> in Append command']
>> Added some print statements to see what was going on.  After 
>> staring at the time string that caused the problem for a long 
>> time, I finally figured out the what was causing the error. 
>> The last part of the time was "+000" instead of "+0000". 

> [...]

> Hmm - haven't seen this before.  This is very strange - we get the
> date/time from the fetch response, preferably, and if that fails, we use
> imaplib's Time2Internaldate function to convert either the message's
> date header or the current time (if all else fails) to the right format.
> I can't see how this could be Time2Internaldate's fault, because the
> timezone is added with a "%+03d%02d" that forces the correct number of
> digits.  This presumably leaves your imap server as the culprit.

> If you run imapfilter with the switch "-i4" it will print out the imap
> commands and responses - it would be interesting to know if this is the
> case, and which imap server this is.  I suppose if it is the case, then
> a patch like the one you used is the only solution.

I also suspected it was the IMAP was to blame.  We use a Mirapoint
Message Server for our mail.  I tried the "-i4" option but didn't
immediately see any "+000" timezones.

>> Next, I tried to classify my Inbox, but kept getting 
>> assertion errors on "assert hamcount <= nham".  Added some 
>> print statements to see what was going on:


>>     assert hamcount <= nham
>> AssertionError
>> Made the following modification to classifier.py to 
>> workaround the problem:


> This is just hiding a problem.  If that assertion fails, it means that
> your database is screwed and that you need to retrain from scratch.
> This is most likely a result of the failed training from the bad
> appending.  Simply deleting the database file (probably hammie.db)
> should fix the problem.

Yep, I knew I wasn't really solving anything.  I just wanted to get
around the problem with a minimum of fuss.

I think I know how I corrupted the database.  When I first brought up
the web interface, I clicked on the "Train as Ham" button when nothing 
was in the textbox.  It looks like that may have gotten the hamcount
out of wack from the very beginning.  

So, I renamed hammie.db and tried training again, but, of course, as
is my custom, I got another traceback: 

> python2.2 imapfilter.py -t -v -p
.type(response): <type 'str'>
response: 22 (UID 100 RFC822 {13605}
*Traceback (most recent call last):
  File "imapfilter.py", line 789, in ?
  File "imapfilter.py", line 775, in run
  File "imapfilter.py", line 607, in Train
    num_ham_trained = folder.Train(self.classifier, False)
  File "imapfilter.py", line 536, in Train
    classifier.unlearn(msg.asTokens(), not isSpam)
  File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 283, in unlearn
    self._remove_msg(wordstream, is_spam)
  File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 424, in _remove_msg
    raise ValueError("spam count would go negative!")
ValueError: spam count would go negative!

I don't have time to look into this right now, so I just put the old,
corrupt, database back.

>> Tried to classify my Inbox again:
>> [...]
>>   File "imapfilter.py", line 157, in _extract_fetch_data
>>     mo = FETCH_RESPONSE_RE.match(response)
>> TypeError: expected string or buffer
>> Added code:
>> 157,159d155
>> <     if options["globals", "verbose"]:
>> <         print "type(response):", type(response)
>> <         print "response:", response
>> and repeated the command, but the error did not recur.
> If it does reoccur, I'd be interested in knowing what the type was.  It
> should either be a tuple or a string, and if it's a tuple, the first
> element should be a string and that is used.  I haven't see this before.

Still have not seen this recur, but could the response have come back as None?

Thanks for your response,


Don Chance
Computer Sciences Corp.
Space Telescope Science Institute
3700 San Martin Dr.
Baltimore, MD 21218

More information about the Spambayes mailing list