[Spambayes-checkins] spambayes/scripts sb_imapfilter.py,1.30,1.31

Tony Meyer anadelonbrin at users.sourceforge.net
Sat May 15 23:50:03 EDT 2004


Update of /cvsroot/spambayes/spambayes/scripts
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv20591/scripts

Modified Files:
	sb_imapfilter.py 
Log Message:
Sometimes we can't fetch a message, which means we can't even add
the exception header like with invalid messages.  We don't want imapfilter
to crash, though, rather just skip that message, warn the user and keep going.
Add code to allow for this sort of behaviour.

One specific case of this is a MemoryError with really large messages.  Handle
this as above.

Change some error checking to asserts, which they should really have been in
the first place.

Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.30
retrieving revision 1.31
diff -C2 -d -r1.30 -r1.31
*** sb_imapfilter.py	3 May 2004 02:12:32 -0000	1.30
--- sb_imapfilter.py	16 May 2004 03:50:00 -0000	1.31
***************
*** 338,341 ****
--- 338,342 ----
          self.got_substance = False
          self.invalid = False
+         self.could_not_retrieve = False
  
      def setFolder(self, folder):
***************
*** 377,394 ****
          if self.got_substance:
              return
!         if not self.uid or not self.id:
!             print "Cannot get substance of message without an id and an UID"
!             return
          imap.SelectFolder(self.folder.name)
          try:
!             response = imap.uid("FETCH", self.uid, self.rfc822_command)
!         except IMAP4.error:
!             self.rfc822_command = "RFC822"
!             self.rfc822_key = "RFC822"
!             response = imap.uid("FETCH", self.uid, self.rfc822_command)
!         if response[0] != "OK":
!             self.rfc822_command = "RFC822"
!             self.rfc822_key = "RFC822"
!             response = imap.uid("FETCH", self.uid, self.rfc822_command)
          self._check(response, "uid fetch")
          data = _extract_fetch_data(response[1][0])
--- 378,412 ----
          if self.got_substance:
              return
!         assert(self.id, "Cannot get substance of message without an id")
!         assert(self.uid, "Cannot get substance of message without an UID")
          imap.SelectFolder(self.folder.name)
          try:
!             try:
!                 response = imap.uid("FETCH", self.uid, self.rfc822_command)
!             except IMAP4.error:
!                 self.rfc822_command = "RFC822"
!                 self.rfc822_key = "RFC822"
!                 response = imap.uid("FETCH", self.uid, self.rfc822_command)
!             if response[0] != "OK":
!                 self.rfc822_command = "RFC822"
!                 self.rfc822_key = "RFC822"
!                 response = imap.uid("FETCH", self.uid, self.rfc822_command)
!         except MemoryError:
!             # Really big messages can trigger a MemoryError here.
!             # The problem seems to be line 311 (Python 2.3) of socket.py,
!             # which has "return "".join(buffers)".
!             # We want to handle this gracefully, although we can't really
!             # do what we do later, and rewrite the message, since we can't
!             # load it in the first place.  Maybe an elegant solution would
!             # be to get the message in parts, or just use the first X
!             # characters for classification.  For now, we just carry on,
!             # warning the user and ignoring the message.
!             self.could_not_retrieve = True
!             print >>sys.stderr, "MemoryError with message %s (uid %s)" % \
!                   (self.id, self.uid)
!             # We could print the traceback, too, but don't for the moment.
!             #traceback.print_exc(None, stream)
!             return
!             
          self._check(response, "uid fetch")
          data = _extract_fetch_data(response[1][0])
***************
*** 476,485 ****
          # we can't actually update the message with IMAP
          # so what we do is create a new message and delete the old one
!         if self.folder is None:
!             raise RuntimeError, """Can't save a message that doesn't
!             have a folder."""
!         if not self.id:
!             raise RuntimeError, """Can't save a message that doesn't have
!             an id."""
          response = imap.uid("FETCH", self.uid, "(FLAGS INTERNALDATE)")
          self._check(response, 'fetch (flags internaldate)')
--- 494,500 ----
          # we can't actually update the message with IMAP
          # so what we do is create a new message and delete the old one
!         assert(self.folder is not None,
!                "Can't save a message that doesn't have a folder.")
!         assert(self.id, "Can't save a message that doesn't have an id.")
          response = imap.uid("FETCH", self.uid, "(FLAGS INTERNALDATE)")
          self._check(response, 'fetch (flags internaldate)')
***************
*** 666,669 ****
--- 681,691 ----
          num_trained = 0
          for msg in self:
+             if msg.could_not_retrieve:
+                 # Something went wrong, and we couldn't even get
+                 # an invalid message, so just skip this one.
+                 # Annoyinly, we'll try to do it every time the
+                 # script runs, but hopefully the user will notice
+                 # the errors and move it soon enough.
+                 continue
              if msg.GetTrained() == (not isSpam):
                  msg.get_substance()
***************
*** 702,705 ****
--- 724,734 ----
          count["unsure"] = 0
          for msg in self:
+             if msg.could_not_retrieve:
+                 # Something went wrong, and we couldn't even get
+                 # an invalid message, so just skip this one.
+                 # Annoyinly, we'll try to do it every time the
+                 # script runs, but hopefully the user will notice
+                 # the errors and move it soon enough.
+                 continue
              if msg.GetClassification() is None:
                  msg.get_substance()




More information about the Spambayes-checkins mailing list