[Spambayes] My adventures with Spambayes...
Don Chance
chance at stsci.edu
Wed Aug 13 17:07:52 EDT 2003
>> Next, I tried training the filter, but kept getting:
>>
>> imaplib.error: APPEND command error: BAD ['Invalid date-time
>> in Append command']
>>
>> Added some print statements to see what was going on. After
>> staring at the time string that caused the problem for a long
>> time, I finally figured out the what was causing the error.
>> The last part of the time was "+000" instead of "+0000".
> [...]
> Hmm - haven't seen this before. This is very strange - we get the
> date/time from the fetch response, preferably, and if that fails, we use
> imaplib's Time2Internaldate function to convert either the message's
> date header or the current time (if all else fails) to the right format.
> I can't see how this could be Time2Internaldate's fault, because the
> timezone is added with a "%+03d%02d" that forces the correct number of
> digits. This presumably leaves your imap server as the culprit.
> If you run imapfilter with the switch "-i4" it will print out the imap
> commands and responses - it would be interesting to know if this is the
> case, and which imap server this is. I suppose if it is the case, then
> a patch like the one you used is the only solution.
I also suspected it was the IMAP was to blame. We use a Mirapoint
Message Server for our mail. I tried the "-i4" option but didn't
immediately see any "+000" timezones.
>> Next, I tried to classify my Inbox, but kept getting
>> assertion errors on "assert hamcount <= nham". Added some
>> print statements to see what was going on:
>[...]
>> assert hamcount <= nham
>> AssertionError
>>
>> Made the following modification to classifier.py to
>> workaround the problem:
>[...]
> This is just hiding a problem. If that assertion fails, it means that
> your database is screwed and that you need to retrain from scratch.
> This is most likely a result of the failed training from the bad
> appending. Simply deleting the database file (probably hammie.db)
> should fix the problem.
Yep, I knew I wasn't really solving anything. I just wanted to get
around the problem with a minimum of fuss.
I think I know how I corrupted the database. When I first brought up
the web interface, I clicked on the "Train as Ham" button when nothing
was in the textbox. It looks like that may have gotten the hamcount
out of wack from the very beginning.
So, I renamed hammie.db and tried training again, but, of course, as
is my custom, I got another traceback:
> python2.2 imapfilter.py -t -v -p
[...]
.type(response): <type 'str'>
response: 22 (UID 100 RFC822 {13605}
*Traceback (most recent call last):
File "imapfilter.py", line 789, in ?
run()
File "imapfilter.py", line 775, in run
imap_filter.Train()
File "imapfilter.py", line 607, in Train
num_ham_trained = folder.Train(self.classifier, False)
File "imapfilter.py", line 536, in Train
classifier.unlearn(msg.asTokens(), not isSpam)
File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 283, in unlearn
self._remove_msg(wordstream, is_spam)
File "/data/copland1/chance/python/site-packages/spambayes/classifier.py", line 424, in _remove_msg
raise ValueError("spam count would go negative!")
ValueError: spam count would go negative!
I don't have time to look into this right now, so I just put the old,
corrupt, database back.
>> Tried to classify my Inbox again:
>>
>> [...]
>>
>> File "imapfilter.py", line 157, in _extract_fetch_data
>> mo = FETCH_RESPONSE_RE.match(response)
>> TypeError: expected string or buffer
>>
>> Added code:
>> 157,159d155
>> < if options["globals", "verbose"]:
>> < print "type(response):", type(response)
>> < print "response:", response
>>
>> and repeated the command, but the error did not recur.
>
>
> If it does reoccur, I'd be interested in knowing what the type was. It
> should either be a tuple or a string, and if it's a tuple, the first
> element should be a string and that is used. I haven't see this before.
Still have not seen this recur, but could the response have come back as None?
Thanks for your response,
Don
--
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Don Chance
Computer Sciences Corp.
Space Telescope Science Institute
3700 San Martin Dr.
Baltimore, MD 21218
410-338-4941
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
More information about the Spambayes
mailing list