[Spambayes-checkins] spambayes pop3proxy.py,1.23,1.24
Richie Hindle
richiehindle@users.sourceforge.net
Tue Nov 26 16:22:14 2002
Update of /cvsroot/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv30178
Modified Files:
pop3proxy.py
Log Message:
o You can now train on mbox files through the web interface.
o Automatically save after training. This can be slow, but we get nasty
consequences from not doing it.
o Also removed the "Shutdown without saving" button, and moved the "Save"
button to the footer - the "Save" button should be all-but-redundant
now, but I've left it in out of paranoia.
o Updated the training functions to account for the new Classifier API.
o Improve the look-n-feel of the training interface, especially on the
Mac, by centring the radio buttons using the more-universally-accepted
<center> tag and by spreading them out a little more.
o Replaced instances of "X-Hammie-Disposition" in comments with the new
"X-Spambayes-Classification".
o Forced the test code to always use pickles.
Index: pop3proxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v
retrieving revision 1.23
retrieving revision 1.24
diff -C2 -d -r1.23 -r1.24
*** pop3proxy.py 26 Nov 2002 04:27:19 -0000 1.23
--- pop3proxy.py 26 Nov 2002 16:22:11 -0000 1.24
***************
*** 2,8 ****
"""A POP3 proxy that works with classifier.py, and adds a simple
! X-Hammie-Disposition header (Yes/No/Unsure) to each incoming email.
! You point pop3proxy at your POP3 server, and configure your email
! client to collect mail from the proxy then filter on the added
header. Usage:
--- 2,8 ----
"""A POP3 proxy that works with classifier.py, and adds a simple
! X-Spambayes-Classification header (ham/spam/unsure) to each incoming
! email. You point pop3proxy at your POP3 server, and configure your
! email client to collect mail from the proxy then filter on the added
header. Usage:
***************
*** 31,35 ****
written out to _pop3proxy.log for each run.
! To make rebuilding the database easier, trained messages are appended
to _pop3proxyham.mbox and _pop3proxyspam.mbox.
"""
--- 31,35 ----
written out to _pop3proxy.log for each run.
! To make rebuilding the database easier, uploaded messages are appended
to _pop3proxyham.mbox and _pop3proxyspam.mbox.
"""
***************
*** 60,63 ****
--- 60,65 ----
o [Francois Granger] Show the raw spambrob number close to the buttons
(this would mean using the extra X-Hammie header by default).
+ o Add Today and Refresh buttons on the Review page.
+ o "There are no untrained messages to display. Return Home."
***************
*** 69,74 ****
o Can it cleanly dynamically update its status display while having a
POP3 converation? Hammering reload sucks.
- o Add a command to save the database without shutting down, and one to
- reload the database.
o Save the stats (num classified, etc.) between sessions.
o "Reload database" button.
--- 71,74 ----
***************
*** 84,92 ****
the training code update (rather than replace!) the database.
o Allow use of the UI without the POP3 proxy.
! o Remove any existing X-Hammie-Disposition header from incoming emails.
o Whitelist.
o Online manual.
o Links to project homepage, mailing list, etc.
o Edit settings through the web.
--- 84,94 ----
the training code update (rather than replace!) the database.
o Allow use of the UI without the POP3 proxy.
! o Remove any existing X-Spambayes-Classification header from incoming
! emails.
o Whitelist.
o Online manual.
o Links to project homepage, mailing list, etc.
o Edit settings through the web.
+ o List of words with stats (it would have to be paged!) a la SpamSieve.
***************
*** 115,123 ****
o Zoe...!
"""
import os, sys, re, operator, errno, getopt, string, cStringIO, time, bisect
import socket, asyncore, asynchat, cgi, urlparse, webbrowser
! import storage, tokenizer, mboxutils
from FileCorpus import FileCorpus, FileMessageFactory, GzipFileMessageFactory
from email.Iterators import typed_subpart_iterator
--- 117,138 ----
o Zoe...!
+ Notes, for the sake of somewhere better to put them:
+
+ Don't proxy spams at all? This would mean writing a full POP3 client
+ and server - it would download all your mail on a timer and serve to you
+ all the non-spams. It could be 'safe' in that it leaves the messages in
+ the real POP3 account until you collect them from it (or in the case of
+ spams, until you collect contemporaneous hams). The web interface would
+ then present all the spams so that you could correct any FPs and mark
+ them for collection. The thing is no longer a proxy (because the first
+ POP3 command in a conversion is STAT or LIST, which tells you how many
+ mails there are - it wouldn't know the answer, and finding out could
+ take weeks over a modem - I've already had problems with clients timing
+ out while the proxy was downloading stuff from the server).
"""
import os, sys, re, operator, errno, getopt, string, cStringIO, time, bisect
import socket, asyncore, asynchat, cgi, urlparse, webbrowser
! import mailbox, storage, tokenizer, mboxutils
from FileCorpus import FileCorpus, FileMessageFactory, GzipFileMessageFactory
from email.Iterators import typed_subpart_iterator
***************
*** 298,301 ****
--- 313,329 ----
return False
+ ## This is an attempt to solve the problem whereby the email client
+ ## times out and closes the connection but the ServerLineReader is still
+ ## connected, so you get errors from the POP3 server next time because
+ ## there's already an active connection. But after introducing this,
+ ## I kept getting unexplained "Bad file descriptor" errors in recv.
+ ##
+ ## def handle_close(self):
+ ## """If the email client closes the connection unexpectedly, eg.
+ ## because of a timeout, close the server connection."""
+ ## self.serverSocket.shutdown(2)
+ ## self.serverSocket.close()
+ ## self.close()
+
def collect_incoming_data(self, data):
"""Asynchat override."""
***************
*** 598,602 ****
footer = """</div>
! <form action='shutdown' method='POST'>
<table width='100%%' cellspacing='0'>
<tr><td class='banner'> <a href='home'>Spambayes Proxy</a>,
--- 626,630 ----
footer = """</div>
! <form action='save' method='POST'>
<table width='100%%' cellspacing='0'>
<tr><td class='banner'> <a href='home'>Spambayes Proxy</a>,
***************
*** 608,614 ****
</body></html>\n"""
! shutdownDB = """<input type='submit' name='how' value='Shutdown'>"""
!
! shutdownPickle = shutdownDB + """
<input type='submit' name='how' value='Save & shutdown'>"""
--- 636,640 ----
</body></html>\n"""
! saveButtons = """<input type='submit' name='how' value='Save'>
<input type='submit' name='how' value='Save & shutdown'>"""
***************
*** 626,632 ****
Total emails trained: Spam: <b>%(nspam)d</b>
Ham: <b>%(nham)d</b><br>
- <form action='save' method='POST'>
- <input type='submit' value='Save database'>
- </form>
"""
--- 652,655 ----
***************
*** 667,673 ****
upload = """<form action='%s' method='POST'
enctype='multipart/form-data'>
! Either upload a message file:
<input type='file' name='file' value=''><br>
! Or paste the whole message (incuding headers) here:<br>
<textarea name='text' rows='3' cols='60'></textarea><br>
%s
--- 690,696 ----
upload = """<form action='%s' method='POST'
enctype='multipart/form-data'>
! Either upload a message %s file:
<input type='file' name='file' value=''><br>
! Or paste one whole message (incuding headers) here:<br>
<textarea name='text' rows='3' cols='60'></textarea><br>
%s
***************
*** 676,684 ****
uploadSumbit = """<input type='submit' name='which' value='%s'>"""
! train = upload % ('train',
(uploadSumbit % "Train as Spam") + " " + \
(uploadSumbit % "Train as Ham"))
! classify = upload % ('classify', uploadSumbit % "Classify")
def __init__(self, clientSocket, socketMap=asyncore.socket_map):
--- 699,707 ----
uploadSumbit = """<input type='submit' name='which' value='%s'>"""
! train = upload % ('train', "or mbox",
(uploadSumbit % "Train as Spam") + " " + \
(uploadSumbit % "Train as Ham"))
! classify = upload % ('classify', "", uploadSumbit % "Classify")
def __init__(self, clientSocket, socketMap=asyncore.socket_map):
***************
*** 760,770 ****
# This is a request for a valid page; run the handler.
self.pushOKHeaders('text/html')
! self.pushPreamble(name, showImage=(name != 'Shutdown'))
handler(params)
timeString = time.asctime(time.localtime())
! if state.useDB:
! self.push(self.footer % (timeString, self.shutdownDB))
! else:
! self.push(self.footer % (timeString, self.shutdownPickle))
def pushOKHeaders(self, contentType, extraHeaders={}):
--- 783,791 ----
# This is a request for a valid page; run the handler.
self.pushOKHeaders('text/html')
! isKill = (params.get('how', '').lower().find('shutdown') >= 0)
! self.pushPreamble(name, showImage=(not isKill))
handler(params)
timeString = time.asctime(time.localtime())
! self.push(self.footer % (timeString, self.saveButtons))
def pushOKHeaders(self, contentType, extraHeaders={}):
***************
*** 832,836 ****
def doSave(self):
! """Saves the database. Worker for onSave and onShutdown."""
self.push("<b>Saving... ")
self.push(' ')
--- 853,857 ----
def doSave(self):
! """Saves the database."""
self.push("<b>Saving... ")
self.push(' ')
***************
*** 839,878 ****
def onSave(self, params):
! """Command handler for "Save"."""
self.doSave()
!
! def onShutdown(self, params):
! """Shutdown the server, saving the pickle if requested to do so."""
! if params['how'].lower().find('save') >= 0:
! self.doSave()
! self.push("<b>Shutdown</b>. Goodbye.</div></body></html>")
! self.push(' ')
! self.shutdown(2)
! self.close()
! raise SystemExit
def onTrain(self, params):
"""Train on an uploaded or pasted message."""
# Upload or paste? Spam or ham?
! message = params.get('file') or params.get('text')
isSpam = (params['which'] == 'Train as Spam')
! # Append the message to a file, to make it easier to rebuild
# the database later. This is a temporary implementation -
# it should keep a Corpus of trained messages.
- message = message.replace('\r\n', '\n').replace('\r', '\n') # For Macs
if isSpam:
f = open("_pop3proxyspam.mbox", "a")
else:
f = open("_pop3proxyham.mbox", "a")
- f.write("From pop3proxy@spambayes.org Sat Jan 31 00:00:00 2000\n")
- f.write(message)
- f.write("\n\n")
- f.close()
! # Train on the message.
! tokens = tokenizer.tokenize(message)
! state.bayes.learn(tokens, isSpam, True)
! self.push("<p>OK. Return <a href='home'>Home</a> or train another:</p>")
self.push(self.pageSection % ('Train another', self.train))
--- 860,916 ----
def onSave(self, params):
! """Command handler for "Save" and "Save & shutdown"."""
self.doSave()
! if params['how'].lower().find('shutdown') >= 0:
! self.push("<b>Shutdown</b>. Goodbye.</div></body></html>")
! self.push(' ')
! self.shutdown(2)
! self.close()
! raise SystemExit
def onTrain(self, params):
"""Train on an uploaded or pasted message."""
# Upload or paste? Spam or ham?
! content = params.get('file') or params.get('text')
isSpam = (params['which'] == 'Train as Spam')
! # Convert platform-specific line endings into unix-style.
! content = content.replace('\r\n', '\n').replace('\r', '\n')
!
! # Single message or mbox?
! if content.startswith('From '):
! # Get a list of raw messages from the mbox content.
! class SimpleMessage:
! def __init__(self, fp):
! self.guts = fp.read()
! contentFile = cStringIO.StringIO(content)
! mbox = mailbox.PortableUnixMailbox(contentFile, SimpleMessage)
! messages = map(lambda m: m.guts, mbox)
! else:
! # Just the one message.
! messages = [content]
!
! # Append the message(s) to a file, to make it easier to rebuild
# the database later. This is a temporary implementation -
# it should keep a Corpus of trained messages.
if isSpam:
f = open("_pop3proxyspam.mbox", "a")
else:
f = open("_pop3proxyham.mbox", "a")
! # Train on the uploaded message(s).
! self.push("<b>Training...</b>\n")
! self.push(' ')
! for message in messages:
! tokens = tokenizer.tokenize(message)
! state.bayes.learn(tokens, isSpam)
! f.write("From pop3proxy@spambayes.org Sat Jan 31 00:00:00 2000\n")
! f.write(message)
! f.write("\n\n")
!
! # Save the database and return a link Home and another training form.
! f.close()
! self.doSave()
! self.push("<p>OK. Return <a href='home'>Home</a> or train again:</p>")
self.push(self.pageSection % ('Train another', self.train))
***************
*** 934,941 ****
def appendMessages(self, lines, keyedMessages, judgement):
"""Appends the lines of a table of messages to 'lines'."""
! buttons = """<input type='radio' name='classify:%s' value='discard'>
! <input type='radio' name='classify:%s' value='defer' %s>
! <input type='radio' name='classify:%s' value='ham' %s>
! <input type='radio' name='classify:%s' value='spam' %s>"""
stripe = 0
for key, message in keyedMessages:
--- 972,980 ----
def appendMessages(self, lines, keyedMessages, judgement):
"""Appends the lines of a table of messages to 'lines'."""
! buttons = \
! """<input type='radio' name='classify:%s' value='discard'>
! <input type='radio' name='classify:%s' value='defer' %s>
! <input type='radio' name='classify:%s' value='ham' %s>
! <input type='radio' name='classify:%s' value='spam' %s>"""
stripe = 0
for key, message in keyedMessages:
***************
*** 970,974 ****
stripeClass = ['stripe_on', 'stripe_off'][stripe]
lines.append("""<tr class='%s'><td>%s</td><td>%s</td>
! <td align='middle'>%s</td></tr>""" % \
(stripeClass, subject, from_, radioGroup))
stripe = stripe ^ 1
--- 1009,1013 ----
stripeClass = ['stripe_on', 'stripe_off'][stripe]
lines.append("""<tr class='%s'><td>%s</td><td>%s</td>
! <td><center>%s</center></td></tr>""" % \
(stripeClass, subject, from_, radioGroup))
stripe = stripe ^ 1
***************
*** 1006,1010 ****
pass # Must be a reload.
! # Report on any training.
if numTrained > 0:
plural = ''
--- 1045,1049 ----
pass # Must be a reload.
! # Report on any training, and save the database if there was any.
if numTrained > 0:
plural = ''
***************
*** 1012,1015 ****
--- 1051,1056 ----
plural = 's'
self.push("Trained on %d message%s. " % (numTrained, plural))
+ self.doSave()
+ self.push("<br> ")
# If any messages were deferred, show the same page again.
***************
*** 1196,1199 ****
--- 1237,1241 ----
print "Loading database...",
if self.isTest:
+ self.useDB = True
self.databaseFilename = '_pop3proxy_test.pickle' # Never saved
if self.useDB:
More information about the Spambayes-checkins
mailing list