[Spambayes] Upgrade problem

Wed Nov 6 21:42:27 2002

Tim Stone - Four Stones Expressions wrote:

> This is why you keep a corpus.  This is pre-alpha code, and anything that 
> anyone does at any time can screw the world up.  You should simply delete your 
> database and retrain it.  If you don't have a corpus, go ahead and make one 
> now... <wink>

Alright, this triggered a feature request in me, which resulted in some hacking
activity <wink>. The patch below appends training messages to one of two mbox
files ('_pop3proxyspam.mbox' or '_pop3proxyham.mbox' respectively), making it
easier to later rebuild the database from scratch, while still being able to
train ad hoc with the web interface of pop3proxy.py. Good idea?

Just

Index: pop3proxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v
retrieving revision 1.10
diff -c -r1.10 pop3proxy.py
*** pop3proxy.py    5 Nov 2002 22:18:56 -0000   1.10
--- pop3proxy.py    6 Nov 2002 21:37:03 -0000
***************
*** 608,615 ****
          raise SystemExit

      def onUpload(self, params):
!         message = params.get('file') or params.get('text')            
          isSpam = (params['which'] == 'spam')
          self.bayes.learn(tokenizer.tokenize(message), isSpam, True)
          self.push("""<p>Trained on your message. Saving database...</p>""")
          self.push(" ")  # Flush... must find out how to do this properly...
--- 608,626 ----
          raise SystemExit

      def onUpload(self, params):
!         message = params.get('file') or params.get('text')
          isSpam = (params['which'] == 'spam')
+         # Append the message to a file, to make it easier to rebuild
+         # the database later.
+         message = message.replace('\r\n', '\n').replace('\r', '\n')
+         if isSpam:
+             f = open("_pop3proxyspam.mbox", "a")
+         else:
+             f = open("_pop3proxyham.mbox", "a")
+         f.write("From ???@???\n")  # fake From line (XXX good enough?)
+         f.write(message)
+         f.write("\n")
+         f.close()
          self.bayes.learn(tokenizer.tokenize(message), isSpam, True)
          self.push("""<p>Trained on your message. Saving database...</p>""")
          self.push(" ")  # Flush... must find out how to do this properly...