[Spambayes] My adventures with Spambayes...

Don Chance chance at stsci.edu
Wed Aug 13 17:07:52 EDT 2003


>> Next, I tried training the filter, but kept getting:
>> 
>> imaplib.error: APPEND command error: BAD ['Invalid date-time 
>> in Append command']
>> 
>> Added some print statements to see what was going on.  After 
>> staring at the time string that caused the problem for a long 
>> time, I finally figured out the what was causing the error. 
>> The last part of the time was "+000" instead of "+0000". 

> [...]

> Hmm - haven't seen this before.  This is very strange - we get the
> date/time from the fetch response, preferably, and if that fails, we use
> imaplib's Time2Internaldate function to convert either the message's
> date header or the current time (if all else fails) to the right format.
> I can't see how this could be Time2Internaldate's fault, because the
> timezone is added with a "%+03d%02d" that forces the correct number of
> digits.  This presumably leaves your imap server as the culprit.

> If you run imapfilter with the switch "-i4" it will print out the imap
> commands and responses - it would be interesting to know if this is the
> case, and which imap server this is.  I suppose if it is the case, then
> a patch like the one you used is the only solution.

I also suspected it was the IMAP was to blame.  We use a Mirapoint
Message Server for our mail.  I tried the "-i4" option but didn't
immediately see any "+000" timezones.

>> Next, I tried to classify my Inbox, but kept getting 
>> assertion errors on "assert hamcount <= nham".  Added some 
>> print statements to see what was going on:

>[...]

>>     assert hamcount <= nham
>> AssertionError
>> 
>> Made the following modification to classifier.py to 
>> workaround the problem:

>[...]

> This is just hiding a problem.  If that assertion fails, it means that
> your database is screwed and that you need to retrain from scratch.
> This is most likely a result of the failed training from the bad
> appending.  Simply deleting the database file (probably hammie.db)
> should fix the problem.

Yep, I knew I wasn't really solving anything.  I just wanted to get
around the problem with a minimum of fuss.

I think I know how I corrupted the database.  When I first brought up
the web interface, I clicked on the "Train as Ham" button when nothing 
was in the textbox.  It looks like that may have gotten the hamcount
out of wack from the very beginning.  

So, I renamed hammie.db and tried training again, but, of course, as
is my custom, I got another traceback: 

> python2.2 imapfilter.py -t -v -p
[...]
.type(response): <type 'str'>
response: 22 (UID 100 RFC822 {13605}
*Traceback (most recent call last):
  File "imapfilter.py", line 789, in ?
    run()
  File "imapfilter.py", line 775, in run
    imap_filter.Train()
  File "imapfilter.py", line 607, in Train
    num_ham_trained = folder.Train(self.classifier, False)
  File "imapfilter.py", line 536, in Train
    classifier.unlearn(msg.asTokens(), not isSpam)
  File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 283, in unlearn
    self._remove_msg(wordstream, is_spam)
  File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 424, in _remove_msg
    raise ValueError("spam count would go negative!")
ValueError: spam count would go negative!

I don't have time to look into this right now, so I just put the old,
corrupt, database back.

>> Tried to classify my Inbox again:
>>
>> [...]
>>
>>   File "imapfilter.py", line 157, in _extract_fetch_data
>>     mo = FETCH_RESPONSE_RE.match(response)
>> TypeError: expected string or buffer
>> 
>> Added code:
>> 157,159d155
>> <     if options["globals", "verbose"]:
>> <         print "type(response):", type(response)
>> <         print "response:", response
>> 
>> and repeated the command, but the error did not recur.
>
>
> If it does reoccur, I'd be interested in knowing what the type was.  It
> should either be a tuple or a string, and if it's a tuple, the first
> element should be a string and that is used.  I haven't see this before.

Still have not seen this recur, but could the response have come back as None?

Thanks for your response,
Don

-- 

_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Don Chance
Computer Sciences Corp.
Space Telescope Science Institute
3700 San Martin Dr.
Baltimore, MD 21218
410-338-4941
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/





More information about the Spambayes mailing list