[Spambayes] Problems getting started with IMAP...

Tony Meyer tameyer at ihug.co.nz
Fri Apr 23 03:55:05 EDT 2004


[End of the week, and I finally get time to look at all of this.  Apologies
for the delay, and thanks for working through things yourself while I was
busy.]

> It doesn't seem to see the dbm module, so it must not
> be in our installation (Solaris 2.6) and I don't
> know much about Python, so I'm not sure how to add it
> in. Since the "-p" option works, I'm fine.

In some ways the -p option is better, anyway (faster to score, slower to
load/train), and you can be fairly certain that you'll avoid any corruption
problems, too.

> Here's what the browser shows, when I pick the "More
> Statistics..." link:
[...]
>    File "/users/pcm/lib/python/spambayes/Stats.py", line 58, in 
> CalculateStats
>      for msg in msginfoDB.db.keys():
> 
> AttributeError: 'NoneType' object has no attribute 'keys'

Ah, I think I know what this is - a bug I introduced with 1.0b1 (IIRC that's
what you're using) or 1.0a9.  It'd probably only be noticeable if you don't
have dbm available.  Try the following patch to spambayes/message.py:

"""
--- 205,215 ----
  # so that these files don't litter lots of working directories.
  # Once there is a master db, this option can be removed.
  message_info_db_name = get_pathname_option("Storage",
"messageinfo_storage_fil
e")
! if options["Storage", "persistent_use_database"] is True or \
!    options["Storage", "persistent_use_database"] == "dbm":
      msginfoDB = MessageInfoDB(message_info_db_name)
! elif options["Storage", "persistent_use_database"] is False or \
!      options["Storage", "persistent_use_database"] == "pickle":
      msginfoDB = MessageInfoPickle(message_info_db_name)

  class Message(email.Message.Message):
"""

> The stats would be nice, and training on incoming messages,
> like the POP3 one would be awesome!

Note that the POP3 proxy does lean towards 'train on everything', which is
almost certainly not the best training regime.  The IMAP filter probably
leans towards 'train on mistakes', which is a reasonable method.

Note that you can set the filter to train on the same folders you are
filtering/moving spam to, which would mean that training was automatically
done on all incoming mail.  I suppose a review page, like the POP3 proxy
one, could be built that presented the messages that were on the IMAP
server.  This would be a reasonable chunk of work, though.  If you really
would like that, then open a feature request
<http://sf.net/projects/spambayes> and assign it to me, and I'll get to it
some day.

> Hmm. With IMAP though, when I look into my mail directory on
> my machine, I don't see the full messages. It almost looks
> like some kind of index or summary? Would I need to move the
> messages to a local folder and then "upload" them?

If there was a review page style training system for IMAP, then I'd get it
to ignore anything local (just like the POP3 one does, in many ways), and
ask the server for a list of messages in certain folders.  It would then
need to download any ones that are used for training.

> OK. So I got a message for prescription drugs (love those :^).
> It was marked as "unsure" and was placed into my Unsure
> folder. I moved it to my Spam folder (used for training).
> Now it just sits there and the classification never changes
> to "spam". I have to move it to my inbox and then it will
> get reclassified and moved to the spam folder.

It does get trained as spam, though.  I suppose I could add a 'rescore after
training' option.  Would that be useful?

> Here's what I'll try:
> 
> ham_train_folders:INBOX.Bayesian.MarkAsHam
> move_trained_ham_to_folder:INBOX
> spam_folder:INBOX.Bayesian.Spam
> spam_train_folders:INBOX.Bayesian.MarkAsSpam
> move_trained_spam_to_folder:INBOX.Bayesian.Spam
> unsure_folder:INBOX.Bayesian.Unsure
> filter_folders:INBOX

This looks fine.

> Is there a option for "ham_folder"? I'm just wondering if
> I should have the Inbox filtered to a Ham, Spam, or Unsure
> folder, rather than having the Inbox filtered every 10
> minutes, since I tyically leave messages in the Inbox for
> a while (which I could in this case, leave them in the
> Ham folder). Is there a better way than what I have here?

I've wondered about a 'ham folder'.  The IMAP filter is modelled loosely on
the Outlook plug-in, and people occasionally ask for a 'ham folder' there,
too.  It would be easy enough to add this, so if you think it would be a
good option, then, again, open a feature request
<http://sf.net/projects/spambayes> and assign it to me.  I'd get to this
before I got to the above.

> It seems, and I haven't verified it yet, that when the filter
> is running, I can only see the headers from my client. For
> example, clicking on the message wont show the body. Once
> the filter goes to sleep, I can access the message.
[...]
> Verified. I cannot see the message bodies in the Inbox, when 
> the sb_imapfilter.py is running (once it sleeps it allows my 
> client to see the folder).

Interesting.  I'm not sure why this would be.  Maybe the server locks the
folder in some way?  Does this cause any problems?

[New messages appearing]
> What I changed in the code, was I allowed another option 
> "ham_folder" and in the classifying code, I move all ham to 
> this folder.  Now, mail comes into my inbox and within two 
> minutes, it gets moved to the spam, ham, or unsure folders. I 
> always read from the ham folder and not the inbox.

I suspect that your reasoning is correct, and apart from simply having imap
filter execute more often, the ham_folder option may be the best solution.
Since this does fix it, it sounds good to me.

> >>    I also had one case where a message was in
> >>    my INBOX, I read it and replied to the sender.
[...]   
> >>    When I looked
> >>    back at the INBOX a few minutes later, the message
> >>    was gone!  Ouch!
> > 
> > I presume it was filtered and classified as unsure/spam?
> 
> I don't think it was, as it was in my Inbox and not in my
> Spam or Unsure folder.
[...]
> I can try a -l 1 later on and see what happens. I haven't
> seen a message lost since, but I do see this message
> appears as new twice thing, all the time.

Let me know if this does happen again.  I don't know why a message would
vanish, unless somehow it was meant to be moved and the old version got
deleted but creation of the new one failed.  Imapfilter should crash in that
case, though, which would make it obvious (and the crash should come
*before* the deletion part, too).

> I guess what I was asking was that if a message was classified
> and then a new copy of the message was created, say marked as
> Spam, and the original was still there marked as deleted, should
> I be able to see the original message somewhere (trash)? I
> don't see the message anywhere.  I have my client configured to
> "when I delete a message, move it to the trash folder".  I'm
> just trying to see if there is a way to recover that message
> that was lost.

Hmm.  I don't know enough about how Netscape presents IMAP mail.  What
happens on the server is that the message stays were it is, but is marked as
deleted.  You later do an 'expunge' and all those messages are deleted.  Do
you end up with lots of duplicate messages in your trash?  If so, then I
guess that Netscape is simply presenting all the 'marked as deleted'
messages.  If not, then I guess it's actually copying the message to a
'trash' folder.

If the latter, then as long as an expunge hasn't been run (i.e.
automatically by Netscape), the message is still there.  Somehow you should
be able to convince Netscape to let you see the messages that are marked for
deletion.  Alternatively, do you have any other way of (temporarily)
connecting to the IMAP server?  (i.e. another mail client, webmail,
something like that?)

> It "seems" like I can run a second instance of 
> sb_imapfilter.py with the -b option. I'm not sure if they interact.

I think they've both got the databases open, though, and we don't really
make any effort at the moment to avoid any conflicts.  I think adding an
option to serve the interface while imapfilter is running with -l would be
good.  I've opened a feature request for this, and will add it at some
point.

> One problem that I see, is that I may not get an
> indication from my mail client that there is "new"
> mail, as it may already have been moved to the ham
> folder, before the client polls for new messages. Not
> a big issue.

Won't Netscape tell you that there's new mail in the 'ham' folder?

> The other thing is I'm seeing that the imapfilter process
> is exiting occasionally. I'm running it with the -v and
> in the foreground to see if I can see what causes it to
> exit.

If it does exit, it should do so with a traceback.  Let me know what it is,
and I should be able to fix it.

> Please let me know, if you think i can do this differently,
> or if you have any questions. I can send you the code
> changes, but they are trivial, and I don't know Python, 
> so...you may want to just use the idea.

Code is good.  Even if it just saves me five minutes :)  Given that it
should be (and you say it is) trivial, then you've probably done it how I
would, anyway.

> Also, how do I 
> precompile? a Python library? I see that there is a 
> Options.py and Options.pyc and I had to change the .py, but I 
> don't know how to create the .pyc file. I'm assuming this is 
> a precomiled script?

The .pyc files are the .py compiled into Python bytecode.  This happens
automatically (it's handled by python itself).  The .pyc files are
worthless, really, since they just get recreated on the fly as necessary.
It's safe to ignore them except under very unusual circumstances.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list