[Spambayes-checkins] spambayes/spambayes ImapUI.py, NONE,
1.1 ProxyUI.py, NONE, 1.1 UserInterface.py, NONE,
1.1 tokenizer.py, 1.7, 1.8
Tony Meyer
anadelonbrin at users.sourceforge.net
Fri Apr 18 03:24:32 EDT 2003
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv18385/spambayes
Modified Files:
tokenizer.py
Added Files:
ImapUI.py ProxyUI.py UserInterface.py
Log Message:
More modular user interface files (code mostly ripped out
of pop3proxy.py, as the comments indicate it will be).
--- NEW FILE: ImapUI.py ---
"""IMAPFilter Web Interface
Classes:
IMAPUserInterface - Interface class for the IMAP filter
Abstract:
This module implements a browser based Spambayes user interface for the
IMAP filter. Users may use it to interface with the filter - it is
expected that this will primarily be for configuration, although users
may also wish to look up words in the database, or classify a message.
The following functions are currently included:
[From the base class UserInterface]
onClassify - classify a given message
onWordquery - query a word from the database
onTrain - train a message or mbox
onSave - save the database and possibly shutdown
[Here]
onHome - a home page with various options
To do:
o There is a function to get a list of all the folders available on
the server, but nothing is done with this. Obviously what we would
like is to present a page where the user selects (checkboxes) the
folders that s/he wishes to filter, the folders s/he wishes to use
as train-as-ham and train-as-spam, and (radio buttons) the folders
to move suspected spam and unsures into. I think this should be
a separate page from the standard config as it's already going to
be really big if there are lots of folders to choose from.
An alternative design would be to have a single list of the folders
and five columns - three of checkboxes (filter, train-as-spam and
train-as-ham) and two of radio buttons (spam folder and ham folder).
I think this might be more confusing, though.
o This could have a neat review page, like pop3proxy, built up by
asking the IMAP server appropriate questions. I don't know whether
this is needed, however. This would then allow viewing a message,
showing the clues for it, and so on. Finding a message (via the
spambayes id) could also be done.
o Suggestions?
"""
# This module is part of the spambayes project, which is Copyright 2002-3
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
__author__ = "Tony Meyer <ta-meyer at ihug.co.nz>, Tim Stone"
__credits__ = "All the Spambayes folk."
try:
True, False
except NameError:
# Maintain compatibility with Python 2.2
True, False = 1, 0
import re
import UserInterface
from Options import options
global classifier
# This control dictionary maps http request parameters and template fields
# to ConfigParser sections and options. The key matches both the input
# field that corresponds to a section/option, and also the HTML template
# variable that is used to display the value of that section/option.
parm_map = \
{'hamcutoff': ('Categorization', 'ham_cutoff'),
'spamcutoff': ('Categorization', 'spam_cutoff'),
'dbname': ('pop3proxy', 'persistent_storage_file'),
'imapserver': ('imap', 'server'),
'imapport': ('imap', 'port'),
'imapusername': ('imap', 'username'),
'imappassword': ('imap', 'password'),
'p3notateto': ('pop3proxy', 'notate_to'),
'p3notatesub': ('pop3proxy', 'notate_subject'),
'p3addid': ('pop3proxy', 'add_mailid_to'),
'p3stripid': ('pop3proxy', 'strip_incoming_mailids'),
'p3prob': ('pop3proxy', 'include_prob'),
'p3thermostat': ('pop3proxy', 'include_thermostat'),
'p3evidence': ('pop3proxy', 'include_evidence'),
}
display = ('IMAP Options', 'imapserver', 'imapport', 'imapusername',
# to display, or not to display; that is the question
# if we show this here, it's in plain text for everyone to
# see (and worse - if we don't restrict connections to
# localhost, it's available for the world to see)
# on the other hand, we have to be able to enter it somehow...
'imappassword',
'Header Options', 'p3notateto', 'p3notatesub',
'p3prob', 'p3thermostat', 'p3evidence',
'p3addid', 'p3stripid',
'Statistics Options', 'dbname', 'hamcutoff', 'spamcutoff')
class IMAPUserInterface(UserInterface.UserInterface):
"""Serves the HTML user interface for the proxies."""
def __init__(self, cls, imap):
global classifier
UserInterface.UserInterface.__init__(self, cls, parm_map, display)
classifier = cls
self.imap = imap
def onHome(self):
"""Serve up the homepage."""
stateDict = classifier.__dict__.copy()
stateDict.update(classifier.__dict__)
statusTable = self.html.statusTable.clone()
del statusTable.proxyDetails
content = (self._buildBox('Status and Configuration',
'status.gif', statusTable % stateDict)+
self._buildTrainBox() +
self._buildClassifyBox() +
self._buildBox('Word query', 'query.gif',
self.html.wordQuery)
)
self._writePreamble("Home")
self.write(content)
self._writePostamble()
def reReadOptions(self):
"""Called by the config page when the user saves some new options, or
restores the defaults."""
# Reload the options.
global classifier
classifier.store()
import Options
reload(Options)
global options
from Options import options
def _folder_list(self):
'''Return a alphabetical list of all folders available
on the server'''
response = imap.list()
if response[0] != "OK" return ()
all_folders = response[1]
folders = []
for fol in all_folders:
r = re.compile(r"\(([\w\\ ]*)\) ")
m = r.search(fol)
name_attributes = fol[:m.end()-1]
folder_delimiter = fol[m.end()+1:m.end()+2]
folders.append(fol[m.end()+5:-1])
folders.sort()
return folders
--- NEW FILE: ProxyUI.py ---
"""POP3Proxy and SMTPProxy Web Interface
Classes:
ProxyUserInterface - Interface class for pop3proxy and smtpproxy
Abstract:
This module implements a browser based Spambayes user interface for the
POP3 proxy and SMTP proxy. Users may use it to interface with the
proxies.
The following functions are currently included:
[From the base class UserInterface]
onClassify - classify a given message
onWordquery - query a word from the database
onTrain - train a message or mbox
onSave - save the database and possibly shutdown
[Here]
onHome - a home page with various options
onUpload - upload a message for later training (used by proxytee.py)
onReview - show messages in corpii
onView - view a message from one of the corpii
onShowclues - show clues for a message
To do:
Web training interface:
o Review already-trained messages, and purge them.
o Put in a link to view a message (plain text, html, multipart...?)
Include a Reply link that launches the registered email client, eg.
mailto:tim at fourstonesExpressions.com?subject=Re:%20pop3proxy&body=Hi%21%0D
o [Francois Granger] Show the raw spambrob number close to the buttons
(this would mean using the extra X-Hammie header by default).
o Add Today and Refresh buttons on the Review page.
User interface improvements:
o Can it cleanly dynamically update its status display while having a POP3
conversation? Hammering reload sucks.
o Suggestions?
"""
# This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
# This module was once part of pop3proxy.py; if you are looking through
# the history of the file, you may need to go back there.
__author__ = "Richie Hindle <richie at entrian.com>"
__credits__ = "Tim Peters, Neale Pickett, Tim Stone, all the Spambayes folk."
try:
True, False
except NameError:
# Maintain compatibility with Python 2.2
True, False = 1, 0
import re
import time
import bisect
import tokenizer
import UserInterface
from Options import options
global state
# This control dictionary maps http request parameters and template fields
# to ConfigParser sections and options. The key matches both the input
# field that corresponds to a section/option, and also the HTML template
# variable that is used to display the value of that section/option.
parm_ini_map = \
{'hamcutoff': ('Categorization', 'ham_cutoff'),
'spamcutoff': ('Categorization', 'spam_cutoff'),
'dbname': ('pop3proxy', 'persistent_storage_file'),
'p3servers': ('pop3proxy', 'servers'),
'p3ports': ('pop3proxy', 'ports'),
'p3notateto': ('pop3proxy', 'notate_to'),
'p3notatesub': ('pop3proxy', 'notate_subject'),
'p3cachemsg': ('pop3proxy', 'cache_messages'),
'p3addid': ('pop3proxy', 'add_mailid_to'),
'p3stripid': ('pop3proxy', 'strip_incoming_mailids'),
'p3prob': ('pop3proxy', 'include_prob'),
'p3thermostat': ('pop3proxy', 'include_thermostat'),
'p3evidence': ('pop3proxy', 'include_evidence'),
'smtpservers': ('smtpproxy', 'servers'),
'smtpports': ('smtpproxy', 'ports'),
'smtpham': ('smtpproxy', 'ham_address'),
'smtpspam': ('smtpproxy', 'spam_address'),
}
display = ('POP3 Proxy Options', 'p3servers', 'p3ports', 'p3cachemsg',
'Header Options', 'p3notateto', 'p3notatesub',
'p3prob', 'p3thermostat', 'p3evidence',
'p3addid', 'p3stripid',
'SMTP Proxy Options', 'smtpservers', 'smtpports', 'smtpham',
'smtpspam',
'Statistics Options', 'dbname', 'hamcutoff', 'spamcutoff')
class ProxyUserInterface(UserInterface.UserInterface):
"""Serves the HTML user interface for the proxies."""
def __init__(self, proxy_state, state_recreator):
global state
UserInterface.UserInterface.__init__(self, proxy_state.bayes,
parm_ini_map, display)
state = proxy_state
self.state_recreator = state_recreator # ugly
def onHome(self):
"""Serve up the homepage."""
stateDict = state.__dict__.copy()
stateDict.update(state.bayes.__dict__)
statusTable = self.html.statusTable.clone()
if not state.servers:
statusTable.proxyDetails = "No POP3 proxies running."
content = (self._buildBox('Status and Configuration',
'status.gif', statusTable % stateDict)+
self._buildBox('Train on proxied messages',
'train.gif', self.html.reviewText) +
self._buildTrainBox() +
self._buildClassifyBox() +
self._buildBox('Word query', 'query.gif',
self.html.wordQuery) +
self._buildBox('Find message', 'query.gif',
self.html.findMessage)
)
self._writePreamble("Home")
self.write(content)
self._writePostamble()
def onUpload(self, file):
"""Save a message for later training - used by Skip's proxytee.py."""
# Convert platform-specific line endings into unix-style.
file = file.replace('\r\n', '\n').replace('\r', '\n')
# Get a message list from the upload and write it into the cache.
messages = self._convertUploadToMessageList(file)
for m in messages:
messageName = state.getNewMessageName()
message = state.unknownCorpus.makeMessage(messageName)
message.setSubstance(m)
state.unknownCorpus.addMessage(message)
# Return a link Home.
self.write("<p>OK. Return <a href='home'>Home</a>.</p>")
def _keyToTimestamp(self, key):
"""Given a message key (as seen in a Corpus), returns the timestamp
for that message. This is the time that the message was received,
not the Date header."""
return long(key[:10])
def _getTimeRange(self, timestamp):
"""Given a unix timestamp, returns a 3-tuple: the start timestamp
of the given day, the end timestamp of the given day, and the
formatted date of the given day."""
# This probably works on Summertime-shift days; time will tell. 8-)
this = time.localtime(timestamp)
start = (this[0], this[1], this[2], 0, 0, 0, this[6], this[7], this[8])
end = time.localtime(time.mktime(start) + 36*60*60)
end = (end[0], end[1], end[2], 0, 0, 0, end[6], end[7], end[8])
date = time.strftime("%A, %B %d, %Y", start)
return time.mktime(start), time.mktime(end), date
def _buildReviewKeys(self, timestamp):
"""Builds an ordered list of untrained message keys, ready for output
in the Review list. Returns a 5-tuple: the keys, the formatted date
for the list (eg. "Friday, November 15, 2002"), the start of the prior
page or zero if there isn't one, likewise the start of the given page,
and likewise the start of the next page."""
# Fetch all the message keys and sort them into timestamp order.
allKeys = state.unknownCorpus.keys()
allKeys.sort()
# The default start timestamp is derived from the most recent message,
# or the system time if there are no messages (not that it gets used).
if not timestamp:
if allKeys:
timestamp = self._keyToTimestamp(allKeys[-1])
else:
timestamp = time.time()
start, end, date = self._getTimeRange(timestamp)
# Find the subset of the keys within this range.
startKeyIndex = bisect.bisect(allKeys, "%d" % long(start))
endKeyIndex = bisect.bisect(allKeys, "%d" % long(end))
keys = allKeys[startKeyIndex:endKeyIndex]
keys.reverse()
# What timestamps to use for the prior and next days? If there any
# messages before/after this day's range, use the timestamps of those
# messages - this will skip empty days.
prior = end = 0
if startKeyIndex != 0:
prior = self._keyToTimestamp(allKeys[startKeyIndex-1])
if endKeyIndex != len(allKeys):
end = self._keyToTimestamp(allKeys[endKeyIndex])
# Return the keys and their date.
return keys, date, prior, start, end
def _appendMessages(self, table, keyedMessageInfo, label):
"""Appends the rows of a table of messages to 'table'."""
stripe = 0
for key, messageInfo in keyedMessageInfo:
row = self.html.reviewRow.clone()
if label == 'Spam':
row.spam.checked = 1
elif label == 'Ham':
row.ham.checked = 1
else:
row.defer.checked = 1
row.subject = messageInfo.subjectHeader
row.subject.title = messageInfo.bodySummary
row.subject.href="view?key=%s&corpus=%s" % (key, label)
row.from_ = messageInfo.fromHeader
subj = cgi.escape(messageInfo.subjectHeader)
row.classify.href="showclues?key=%s&subject=%s" % (key, subj)
setattr(row, 'class', ['stripe_on', 'stripe_off'][stripe]) # Grr!
row = str(row).replace('TYPE', label).replace('KEY', key)
table += row
stripe = stripe ^ 1
def onReview(self, **params):
"""Present a list of message for (re)training."""
# Train/discard sumbitted messages.
self._writePreamble("Review")
id = ''
numTrained = 0
numDeferred = 0
for key, value in params.items():
if key.startswith('classify:'):
id = key.split(':')[2]
if value == 'spam':
targetCorpus = state.spamCorpus
elif value == 'ham':
targetCorpus = state.hamCorpus
elif value == 'discard':
targetCorpus = None
try:
state.unknownCorpus.removeMessage(state.unknownCorpus[id])
except KeyError:
pass # Must be a reload.
else: # defer
targetCorpus = None
numDeferred += 1
if targetCorpus:
sourceCorpus = None
if state.unknownCorpus.get(id) is not None:
sourceCorpus = state.unknownCorpus
elif state.hamCorpus.get(id) is not None:
sourceCorpus = state.hamCorpus
elif state.spamCorpus.get(id) is not None:
sourceCorpus = state.spamCorpus
if sourceCorpus is not None:
try:
targetCorpus.takeMessage(id, sourceCorpus)
if numTrained == 0:
self.write("<p><b>Training... ")
self.flush()
numTrained += 1
except KeyError:
pass # Must be a reload.
# Report on any training, and save the database if there was any.
if numTrained > 0:
plural = ''
if numTrained != 1:
plural = 's'
self.write("Trained on %d message%s. " % (numTrained, plural))
self._doSave()
self.write("<br> ")
title = ""
keys = []
sourceCorpus = state.unknownCorpus
# If any messages were deferred, show the same page again.
if numDeferred > 0:
start = self._keyToTimestamp(id)
# Else after submitting a whole page, display the prior page or the
# next one. Derive the day of the submitted page from the ID of the
# last processed message.
elif id:
start = self._keyToTimestamp(id)
unused, unused, prior, unused, next = self._buildReviewKeys(start)
if prior:
start = prior
else:
start = next
# Else if they've hit Previous or Next, display that page.
elif params.get('go') == 'Next day':
start = self._keyToTimestamp(params['next'])
elif params.get('go') == 'Previous day':
start = self._keyToTimestamp(params['prior'])
# Else if an id has been specified, just show that message
elif params.get('find') is not None:
key = params['find']
error = False
if key == "":
error = True
page = "<p>You must enter an id to find.</p>"
elif state.unknownCorpus.get(key) == None:
# maybe this message has been moved to the spam
# or ham corpus
if state.hamCorpus.get(key) != None:
sourceCorpus = state.hamCorpus
elif state.spamCorpus.get(key) != None:
sourceCorpus = state.spamCorpus
else:
error = True
page = "<p>Could not find message with id '"
page += key + "' - maybe it expired.</p>"
if error == True:
title = "Did not find message"
box = self._buildBox(title, 'status.gif', page)
self.write(box)
self.write(self._buildBox('Find message', 'query.gif',
self.html.findMessage))
self._writePostamble()
return
keys.append(params['find'])
prior = this = next = 0
title = "Found message"
# Else show the most recent day's page, as decided by _buildReviewKeys.
else:
start = 0
# Build the lists of messages: spams, hams and unsure.
if len(keys) == 0:
keys, date, prior, this, next = self._buildReviewKeys(start)
keyedMessageInfo = {options.header_spam_string: [],
options.header_ham_string: [],
options.header_unsure_string: []}
for key in keys:
# Parse the message, get the judgement header and build a message
# info object for each message.
cachedMessage = sourceCorpus[key]
message = mboxutils.get_message(cachedMessage.getSubstance())
judgement = message[options.hammie_header_name]
if judgement is None:
judgement = options.header_unsure_string
else:
judgement = judgement.split(';')[0].strip()
messageInfo = self._makeMessageInfo(message)
keyedMessageInfo[judgement].append((key, messageInfo))
# Present the list of messages in their groups in reverse order of
# appearance.
if keys:
page = self.html.reviewtable.clone()
if prior:
page.prior.value = prior
del page.priorButton.disabled
if next:
page.next.value = next
del page.nextButton.disabled
templateRow = page.reviewRow.clone()
page.table = "" # To make way for the real rows.
for header, label in ((options.header_spam_string, 'Spam'),
(options.header_ham_string, 'Ham'),
(options.header_unsure_string, 'Unsure')):
messages = keyedMessageInfo[header]
if messages:
subHeader = str(self.html.reviewSubHeader)
subHeader = subHeader.replace('TYPE', label)
page.table += self.html.blankRow
page.table += subHeader
self._appendMessages(page.table, messages, label)
page.table += self.html.trainRow
if title == "":
title = "Untrained messages received on %s" % date
box = self._buildBox(title, None, page) # No icon, to save space.
else:
page = "<p>There are no untrained messages to display. "
page += "Return <a href='home'>Home</a>.</p>"
title = "No untrained messages"
box = self._buildBox(title, 'status.gif', page)
self.write(box)
self._writePostamble()
def onView(self, key, corpus):
"""View a message - linked from the Review page."""
self._writePreamble("View message", parent=('review', 'Review'))
message = state.unknownCorpus.get(key)
if message:
self.write("<pre>%s</pre>" % cgi.escape(message.getSubstance()))
else:
self.write("<p>Can't find message %r. Maybe it expired.</p>" % key)
self._writePostamble()
def onShowclues(self, key, subject):
"""Show clues for a message - linked from the Review page."""
self._writePreamble("Message clues", parent=('review', 'Review'))
message = state.unknownCorpus.get(key).getSubstance()
message = message.replace('\r\n', '\n').replace('\r', '\n') # For Macs
if message:
results = self._buildCluesTable(message, subject)
del results.classifyAnother
self.write(results)
else:
self.write("<p>Can't find message %r. Maybe it expired.</p>" % key)
self._writePostamble()
def _makeMessageInfo(self, message):
"""Given an email.Message, return an object with subjectHeader,
fromHeader and bodySummary attributes. These objects are passed into
appendMessages by onReview - passing email.Message objects directly
uses too much memory."""
subjectHeader = message["Subject"] or "(none)"
fromHeader = message["From"] or "(none)"
try:
part = typed_subpart_iterator(message, 'text', 'plain').next()
text = part.get_payload()
except StopIteration:
try:
part = typed_subpart_iterator(message, 'text', 'html').next()
text = part.get_payload()
text, unused = tokenizer.crack_html_style(text)
text, unused = tokenizer.crack_html_comment(text)
text = tokenizer.html_re.sub(' ', text)
text = '(this message only has an HTML body)\n' + text
except StopIteration:
text = '(this message has no text body)'
if type(text) == type([]): # gotta be a 'right' way to do this
text = "(this message is a digest of %s messages)" % (len(text))
else:
text = text.replace(' ', ' ') # Else they'll be quoted
text = re.sub(r'(\s)\s+', r'\1', text) # Eg. multiple blank lines
text = text.strip()
class _MessageInfo:
pass
messageInfo = _MessageInfo()
messageInfo.subjectHeader = self._trimHeader(subjectHeader, 50, True)
messageInfo.fromHeader = self._trimHeader(fromHeader, 40, True)
messageInfo.bodySummary = self._trimHeader(text, 200)
return messageInfo
def reReadOptions(self):
"""Called by the config page when the user saves some new options, or
restores the defaults."""
# Reload the options.
global state
state.bayes.store()
import Options
reload(Options)
global options
from Options import options
# Recreate the state.
self.state_recreator()
--- NEW FILE: UserInterface.py ---
"""Web User Interface
Classes:
UserInterfaceServer - Implements the web server component
via a Dibbler plugin.
BaseUserInterface - Just has utilities for creating boxes and so forth.
(Does not include any pages)
UserInterface - A base class for Spambayes web user interfaces.
Abstract:
This module implements a browser based Spambayes user interface. Users can
*not* use this class (there is no 'home' page), but developments should
sub-class it to provide an appropriate interface for their application.
Functions deemed appropriate for all application interfaces are included.
These currently include:
onClassify - classify a given message
onWordquery - query a word from the database
onTrain - train a message or mbox
onSave - save the database and possibly shutdown
onConfig - present the appropriate configuration page
To Do:
Web training interface:
o Functional tests.
o Keyboard navigation (David Ascher). But aren't Tab and left/right
arrow enough?
User interface improvements:
o Once the pieces are on separate pages, make the paste box bigger.
o Deployment: Windows executable? atlaxwin and ctypes? Or just
webbrowser?
o Save the stats (num classified, etc.) between sessions.
o "Reload database" button.
o Checkboxes need a default value (i.e. what to set the option as
when no boxes are checked). This needs to be thought about and
then implemented. add_id is an example of what it does at the
moment.
o Suggestions?
"""
# This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.
# This module was once part of pop3proxy.py; if you are looking through
# the history of the file, you may need to go back there.
# The options/configuration section started life in OptionConfig.py.
# You can find this file in the cvs attic if you want to trawl through
# its history.
__author__ = """Richie Hindle <richie at entrian.com>,
Tim Stone <tim at fourstonesExpressions.com>"""
__credits__ = "Tim Peters, Neale Pickett, Tony Meyer, all the Spambayes folk."
try:
True, False
except NameError:
# Maintain compatibility with Python 2.2
True, False = 1, 0
import re
import time
import email
import binascii
import cgi
import mailbox
import PyMeldLite
import Dibbler
import tokenizer
from Options import options, optionsPathname, defaults
IMAGES = ('helmet', 'status', 'config',
'message', 'train', 'classify', 'query')
global classifier
class UserInterfaceServer(Dibbler.HTTPServer):
"""Implements the web server component via a Dibbler plugin."""
def __init__(self, uiPort):
Dibbler.HTTPServer.__init__(self, uiPort)
print 'User interface url is http://localhost:%d/' % (uiPort)
class BaseUserInterface(Dibbler.HTTPPlugin):
def __init__(self):
Dibbler.HTTPPlugin.__init__(self)
htmlSource, self._images = self.readUIResources()
self.html = PyMeldLite.Meld(htmlSource, readonly=True)
def onIncomingConnection(self, clientSocket):
"""Checks the security settings."""
return options.html_ui_allow_remote_connections or \
clientSocket.getpeername()[0] == clientSocket.getsockname()[0]
def _writePreamble(self, name, parent=None, showImage=True):
"""Writes the HTML for the beginning of a page - time-consuming
methlets use this and `_writePostamble` to write the page in
pieces, including progress messages. `parent` (if given) should
be a pair: `(url, label)`, eg. `('review', 'Review')`."""
# Take the whole palette and remove the content and the footer,
# leaving the header and an empty body.
html = self.html.clone()
html.mainContent = " "
del html.footer
# Add in the name of the page and remove the link to Home if this
# *is* Home.
html.title = name
if name == 'Home':
del html.homelink
html.pagename = "Home"
elif parent:
html.pagename = "> <a href='%s'>%s</a> > %s" % \
(parent[0], parent[1], name)
else:
html.pagename = "> " + name
# Remove the helmet image if we're not showing it - this happens on
# shutdown because the browser might ask for the image after we've
# exited.
if not showImage:
del html.helmet
# Strip the closing tags, so we push as far as the start of the main
# content. We'll push the closing tags at the end.
self.writeOKHeaders('text/html')
self.write(re.sub(r'</div>\s*</body>\s*</html>', '', str(html)))
def _writePostamble(self):
"""Writes the end of time-consuming pages - see `_writePreamble`."""
footer = self.html.footer.clone()
footer.timestamp = time.asctime(time.localtime())
self.write("</div>" + self.html.footer)
self.write("</body></html>")
def _trimHeader(self, field, limit, quote=False):
"""Trims a string, adding an ellipsis if necessary and HTML-quoting
on request. Also pumps it through email.Header.decode_header, which
understands charset sections in email headers - I suspect this will
only work for Latin character sets, but hey, it works for Francois
Granger's name. 8-)"""
try:
sections = email.Header.decode_header(field)
except (binascii.Error, email.Errors.HeaderParseError):
sections = [(field, None)]
field = ' '.join([text for text, unused in sections])
if len(field) > limit:
field = field[:limit-3] + "..."
if quote:
field = cgi.escape(field)
return field
def onHome(self):
"""Serve up the homepage."""
raise NotImplementedError
def _writeImage(self, image):
self.writeOKHeaders('image/gif')
self.write(self._images[image])
# If you are easily offended, look away now...
for imageName in IMAGES:
exec "def %s(self): self._writeImage('%s')" % \
("on%sGif" % imageName.capitalize(), imageName)
def _buildBox(self, heading, icon, content):
"""Builds a yellow-headed HTML box."""
box = self.html.headedBox.clone()
box.heading = heading
if icon:
box.icon.src = icon
else:
del box.iconCell
box.boxContent = content
return box
def readUIResources(self):
"""Returns ui.html and a dictionary of Gifs."""
# Using `exec` is nasty, but I couldn't figure out a way of making
# `getattr` or `__import__` work with ResourcePackage.
from spambayes.resources import ui_html
images = {}
for baseName in IMAGES:
moduleName = '%s.%s_gif' % ('spambayes.resources', baseName)
module = __import__(moduleName, {}, {}, ('spambayes', 'resources'))
images[baseName] = module.data
return ui_html.data, images
class UserInterface(BaseUserInterface):
"""Serves the HTML user interface."""
def __init__(self, bayes, config_parms=[], config_display=[]):
"""Load up the necessary resources: ui.html and helmet.gif."""
global classifier
BaseUserInterface.__init__(self)
classifier = bayes
self.parm_ini_map = config_parms
self.display = config_display
def onClassify(self, file, text, which):
"""Classify an uploaded or pasted message."""
message = file or text
message = message.replace('\r\n', '\n').replace('\r', '\n') # For Macs
results = self._buildCluesTable(message)
results.classifyAnother = self._buildClassifyBox()
self._writePreamble("Classify")
self.write(results)
self._writePostamble()
def _buildCluesTable(self, message, subject=None):
cluesTable = self.html.cluesTable.clone()
cluesRow = cluesTable.cluesRow.clone()
del cluesTable.cluesRow # Delete dummy row to make way for real ones
(probability, clues) = classifier.spamprob(tokenizer.tokenize(message),\
evidence=True)
for word, wordProb in clues:
cluesTable += cluesRow % (cgi.escape(word), wordProb)
results = self.html.classifyResults.clone()
results.probability = probability
if subject is None:
heading = "Clues:"
else:
heading = "Clues for: " + subject
results.cluesBox = self._buildBox(heading, 'status.gif', cluesTable)
return results
def onWordquery(self, word):
if word == "":
stats = "You must enter a word."
else:
word = word.lower()
wordinfo = classifier._wordinfoget(word)
if wordinfo:
stats = self.html.wordStats.clone()
stats.spamcount = wordinfo.spamcount
stats.hamcount = wordinfo.hamcount
stats.spamprob = classifier.probability(wordinfo)
else:
stats = "%r does not exist in the database." % cgi.escape(word)
query = self.html.wordQuery.clone()
query.word.value = word
statsBox = self._buildBox("Statistics for %r" % cgi.escape(word),
'status.gif', stats)
queryBox = self._buildBox("Word query", 'query.gif', query)
self._writePreamble("Word query")
self.write(statsBox + queryBox)
self._writePostamble()
def onTrain(self, file, text, which):
"""Train on an uploaded or pasted message."""
self._writePreamble("Train")
# Upload or paste? Spam or ham?
content = file or text
isSpam = (which == 'Train as Spam')
# Convert platform-specific line endings into unix-style.
content = content.replace('\r\n', '\n').replace('\r', '\n')
# The upload might be a single message or am mbox file.
messages = self._convertUploadToMessageList(content)
# Append the message(s) to a file, to make it easier to rebuild
# the database later. This is a temporary implementation -
# it should keep a Corpus of trained messages.
if isSpam:
f = open("_pop3proxyspam.mbox", "a")
else:
f = open("_pop3proxyham.mbox", "a")
# Train on the uploaded message(s).
self.write("<b>Training...</b>\n")
self.flush()
for message in messages:
tokens = tokenizer.tokenize(message)
classifier.learn(tokens, isSpam)
f.write("From pop3proxy at spambayes.org Sat Jan 31 00:00:00 2000\n")
f.write(message)
f.write("\n\n")
# Save the database and return a link Home and another training form.
f.close()
self._doSave()
self.write("<p>OK. Return <a href='home'>Home</a> or train again:</p>")
self.write(self._buildTrainBox())
self._writePostamble()
def _convertUploadToMessageList(self, content):
"""Returns a list of raw messages extracted from uploaded content.
You can upload either a single message or an mbox file."""
if content.startswith('From '):
# Get a list of raw messages from the mbox content.
class SimpleMessage:
def __init__(self, fp):
self.guts = fp.read()
contentFile = StringIO.StringIO(content)
mbox = mailbox.PortableUnixMailbox(contentFile, SimpleMessage)
return map(lambda m: m.guts, mbox)
else:
# Just the one message.
return [content]
def _doSave(self):
"""Saves the database."""
self.write("<b>Saving... ")
self.flush()
classifier.store()
self.write("Done</b>.\n")
def onSave(self, how):
"""Command handler for "Save" and "Save & shutdown"."""
isShutdown = how.lower().find('shutdown') >= 0
self._writePreamble("Save", showImage=(not isShutdown))
self._doSave()
if isShutdown:
self.write("<p>%s</p>" % self.html.shutdownMessage)
self.write("</div></body></html>")
self.flush()
## Is this still required?: self.shutdown(2)
self.close()
raise SystemExit
self._writePostamble()
def _buildClassifyBox(self):
"""Returns a "Classify a message" box. This is used on both the Home
page and the classify results page. The Classify form is based on the
Upload form."""
form = self.html.upload.clone()
del form.or_mbox
del form.submit_spam
del form.submit_ham
form.action = "classify"
return self._buildBox("Classify a message", 'classify.gif', form)
def _buildTrainBox(self):
"""Returns a "Train on a given message" box. This is used on both
the Home page and the training results page. The Train form is
based on the Upload form."""
form = self.html.upload.clone()
del form.submit_classify
return self._buildBox("Train on a given message", 'message.gif', form)
def reReadOptions(self):
"""Called by the config page when the user saves some new options,
or restores the defaults."""
pass
def onConfig(self):
# Start with an empty config form then add the sections.
html = self.html.clone()
# "Save and Shutdown" is confusing here - it means "Save database"
# but that's not clear.
html.shutdownTableCell = " "
html.mainContent = self.html.configForm.clone()
html.mainContent.configFormContent = ""
html.mainContent.optionsPathname = optionsPathname
configTable = None
section = None
# Loop though the sections.
for html_key in self.display:
if not self.parm_ini_map.has_key(html_key):
if configTable is not None and section is not None:
# Finish off the box for this section and add it
# to the form.
section.boxContent = configTable
html.configFormContent += section
# Start the yellow-headed box for this section.
section = self.html.headedBox.clone()
# Get a clone of the config table and a clone of each
# example row, then blank out the example rows to make way
# for the real ones.
configTable = self.html.configTable.clone()
configTextRow1 = configTable.configTextRow1.clone()
configCbRow1 = configTable.configCbRow1.clone()
configRow2 = configTable.configRow2.clone()
blankRow = configTable.blankRow.clone()
del configTable.configTextRow1
del configTable.configCbRow1
del configTable.configRow2
del configTable.blankRow
section.heading = html_key
del section.iconCell
continue
(sect, opt) = self.parm_ini_map[html_key]
# Populate the rows with the details and add them to the table.
if type(options.valid_input(sect, opt)) == type(""):
# we provide a text input
newConfigRow1 = configTextRow1.clone()
newConfigRow1.label = options.display_name(sect, opt)
newConfigRow1.input.name = html_key
newConfigRow1.input.value = options.get(sect, opt)
else:
# we provide checkboxes/radio buttons
newConfigRow1 = configCbRow1.clone()
newConfigRow1.label = options.display_name(sect, opt)
blankOption = newConfigRow1.input.clone()
firstOpt = True
i = 0
for val in options.valid_input(sect, opt):
newOption = blankOption.clone()
if str(val) in str(options[sect, opt]).split():
newOption.input_box.checked = "checked"
# help for Python 2.2
if options.is_boolean(sect, opt):
if str(val) == "0":
val = "False"
elif str(val) == "1":
val = "True"
newOption.val_label = str(val)
if options.multiple_values_allowed(sect, opt):
newOption.input_box.type = "checkbox"
newOption.input_box.name = html_key + '-' + str(i)
i += 1
else:
newOption.input_box.type = "radio"
newOption.input_box.name = html_key
newOption.input_box.value = str(val)
if firstOpt:
newConfigRow1.input = newOption
firstOpt = False
else:
newConfigRow1.input += newOption
# Insert the help text in a cell
newConfigRow1.helpCell = '<strong>' + \
options.display_name(sect, opt) + \
':</strong> ' + \
cgi.escape(options.doc(sect, opt))
newConfigRow2 = configRow2.clone()
currentValue = options[sect, opt]
# for Python 2.2
if options.is_boolean(sect, opt):
if str(currentValue) == '0':
currentValue = "False"
elif str(currentValue) == '1':
currentValue = "True"
newConfigRow2.currentValue = currentValue
configTable += newConfigRow1 + newConfigRow2 + blankRow
# Finish off the box for this section and add it to the form.
if section is not None:
section.boxContent = configTable
html.configFormContent += section
html.title = 'Home > Configure'
html.pagename = '> Configure'
self.writeOKHeaders('text/html')
self.write(html)
def onChangeopts(self, **parms):
html = self.html.clone()
html.shutdownTableCell = " "
html.mainContent = self.html.headedBox.clone()
errmsg = self.verifyInput(parms)
if errmsg != '':
html.mainContent.heading = "Errors Detected"
html.mainContent.boxContent = errmsg
html.title = 'Home > Error'
html.pagename = '> Error'
self.writeOKHeaders('text/html')
self.write(html)
return
for name, value in parms.items():
if self.parm_ini_map.has_key(name):
sect, opt = self.parm_ini_map[name]
options.set(sect, opt, value)
op = open(optionsPathname, "r")
options.update_file(op)
op.close()
self.reReadOptions()
html.mainContent.heading = "Options Changed"
html.mainContent.boxContent = "%s. Return <a href='home'>Home</a>." \
% "Options changed"
html.title = 'Home > Options Changed'
html.pagename = '> Options Changed'
self.writeOKHeaders('text/html')
self.write(html)
def onRestoredefaults(self, how):
self.restoreConfigDefaults()
self.reReadOptions()
html = self.html.clone()
html.shutdownTableCell = " "
html.mainContent = self.html.headedBox.clone()
html.mainContent.heading = "Option Defaults Restored"
html.mainContent.boxContent = "%s. Return <a href='home'>Home</a>." \
% "Defaults restored"
html.title = 'Home > Defaults Restored'
html.pagename = '> Defaults Restored'
self.writeOKHeaders('text/html')
self.write(html)
def verifyInput(self, parms):
'''Check that the given input is valid.'''
# Most of the work here is done by the options class, but
# we have a few extra checks that are beyond its capabilities
errmsg = ''
# mumbo-jumbo to deal with the checkboxes
# XXX This will break with more than 9 checkboxes
# XXX A better solution is needed than this
for name, value in parms.items():
if name[-2:-1] == '-':
if parms.has_key(name[:-2]):
parms[name[:-2]].append(value)
else:
parms[name[:-2]] = (value,)
del parms[name]
for html_key in self.display:
if not self.parm_ini_map.has_key(html_key):
nice_section_name = html_key
continue
sect, opt = self.parm_ini_map[html_key]
if not parms.has_key(html_key):
# This is a set of checkboxes where none are selected
value = None
else:
value = parms[html_key]
if value is not None:
if type(value) == type((0,1)):
value_string = ""
for val in value:
value_string += val
value_string += ','
value = value_string[:-1]
value = options.convert(sect, opt, value)
if not options.is_valid(sect, opt, value):
errmsg += '<li>\'%s\' is not a value valid for [%s] %s' % \
(value, nice_section_name,
options.display_name(sect, opt))
if type(options.valid_input(sect, opt)) == type((0,1)):
errmsg += '. Valid values are: '
for valid in options.valid_input(sect, opt):
errmsg += str(valid) + ','
errmsg = errmsg[:-1] # cut last ','
errmsg += '</li>'
parms[html_key] = value
# check for equal number of pop3servers and ports
slist = parms['p3servers'].split(',')
plist = parms['p3ports'].split(',')
if len(slist) != len(plist):
errmsg += '<li>The number of POP3 proxy ports specified ' + \
'must match the number of servers specified</li>\n'
# check for duplicate ports
plist.sort()
for p in range(len(plist)-1):
try:
if plist[p] == plist[p+1]:
errmsg += '<li>All POP3 port numbers must be unique</li>'
break
except IndexError:
pass
# check for equal number of smtpservers and ports
slist = parms['smtpservers'].split(',')
plist = parms['smtpports'].split(',')
if len(slist) != len(plist):
errmsg += '<li>The number of SMTP proxy ports specified ' + \
'must match the number of servers specified</li>\n'
# check for duplicate ports
plist.sort()
for p in range(len(plist)-1):
try:
if plist[p] == plist[p+1]:
errmsg += '<li>All SMTP port numbers must be unique</li>'
break
except IndexError:
pass
return errmsg
def restoreConfigDefaults(self):
# note that the behaviour of this function has subtly changed
# previously options were removed from the config file, now the
# config file is updated to match the defaults
c = ConfigParser()
d = StringIO(defaults)
c.readfp(d)
del d
# Only restore the settings that appear on the form.
for section, option in self.parm_ini_map.values():
if not options.no_restore(section, option):
options.set(section, option, c.get(section,option))
op = open(optionsPathname, "r")
options.update_file(op)
op.close()
Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** tokenizer.py 6 Mar 2003 15:47:19 -0000 1.7
--- tokenizer.py 18 Apr 2003 09:24:29 -0000 1.8
***************
*** 17,21 ****
from sets import Set
except ImportError:
! from spambayes.compatsets import Set
--- 17,21 ----
from sets import Set
except ImportError:
! from compatsets import Set
More information about the Spambayes-checkins
mailing list