[Spambayes] LONG POST: Log file and Outlook Rules analysis

Hadar Pedhazur hadar at unorthodox.com
Wed Jun 4 09:15:29 EDT 2003


OK, Tim and Mark, thanks again for the excellent suggestions
for what to track down next. In no particular order, here is
what I've learned:

1) It's not stricly a "cpu busy" problem. This morning, I
downloaded my mail with the cpu completely idle, and didn't
run anything else until all the mail was in, and sorted into
the various mailboxes. At the peak, only at the tail end of
all of my messages, the cpu spiked to 65% utilization, but
during the vast majority of the messages, it never went
above 20%. I still have the problem, so this doesn't appear
to be the cause, but perhaps still is...

2) I have 52 Outlook rules. Roughly 10 of them are spam
related (from before SB), so I could reduce it. Turning them
off completely (as Tim suggests) would be too painful to
consider at the moment. I deal with many companies each day,
and auto sort email from each of those companies into
different folders, so that I can deal with them in priority
order. If I have to revert to everything coming into one
folder, and then filing them by hand, I would lose too much
time having to scan the lower priority ones. If this is
indeed the problem, that will be too bad for me.

I wonder how this can be the problem though, as it seemed to
me that the purpose of the plugin was to look at every
message before the rules saw them, and only hand "clean"
messages off to the rule. Also, I wonder whether I should
forgo the concept of the plugin (as wonderful as it is to
incrementally train), and simply use the pop3proxy instead?

3) At Mark's suggestion, I looked at the log. I had 188
messages in my POP3 mailbox on the server waiting to be
downloaded. 40 went into the spam folder, 4 into the unsure
folder, and the rest went into the various folders according
to my rules. 87 of those were caught by SpamAssassin (SA)
but when opened individually, showed SB scores of at least
.99, with many at 1.0! This of course makes it all the more
frustrating, because the training has obviously learned my
patterns amazingly well!

The problem is indeed in _losing messages_! Out of the 188
messages that came in, there were only 50 messages seen in
the log file. All 50 were processed correctly, meaning they
ended up in the correct folders as spam, ham or unsure after
being processed. The first three messages were all seen, and
were one of each kind. Then it became sporadic as to what
was seen and what wasn't. So, it is not the case that the
first 50 were seen, and then none of the others. It was
either message #4 or #5 that was already missed, before
another was seen, etc.

Finally, I looked at a previous log file, to see if there
was any pattern to the number of messages in the file. It
turns out that in a number of previous logs I had python
errors. Here is one example:

pythoncom error: Python error invoking COM method.
Traceback (most recent call last):
  File "E:\src\pythonex\com\win32com\server\policy.py", line 275, in _Invoke_
  File "E:\src\pythonex\com\win32com\server\policy.py", line 280, in _invoke_
  File "E:\src\pythonex\com\win32com\server\policy.py", line 541, in _invokeex_
  File "E:\src\spambayes\Outlook2000\addin.py", line 352, in OnClick
  File "E:\src\spambayes\Outlook2000\train.py", line 46, in train_message
  File "E:\src\spambayes\Outlook2000\msgstore.py", line 641, in GetEmailPackageObject
  File "E:\src\python-cvs\lib\email\__init__.py", line 52, in message_from_string
  File "E:\src\python-cvs\lib\email\Parser.py", line 75, in parsestr
  File "E:\src\python-cvs\lib\email\Parser.py", line 64, in parse
  File "E:\src\python-cvs\lib\email\Parser.py", line 245, in _parsebody
email.Errors.BoundaryError: multipart message with no defined boundary
E:\src\python-cvs\lib\fcntl.py:7: DeprecationWarning: the FCNTL module is deprecated; please use fcntl

It shows the particular text of the message before this
traceback. I'm happy to forward that (or the whole log) to
anyone who cares to see it. There's nothing in it that I
can't share.

For clarification, today's log has no errors in it, just
missing messages.

Sorry for the long post, but this is so effective in scoring
spam, and I somehow feel so close to deriving the benefit
from it, that I am reaching out with as much info as I can
give in the hope of getting this one nailed.

Thanks for reading this far!



More information about the Spambayes mailing list