[Spambayes-checkins] spambayes/spambayes ImapUI.py, NONE, 1.1 ProxyUI.py, NONE, 1.1 UserInterface.py, NONE, 1.1 tokenizer.py, 1.7, 1.8

Tony Meyer anadelonbrin at users.sourceforge.net
Fri Apr 18 03:24:32 EDT 2003


Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv18385/spambayes

Modified Files:
	tokenizer.py 
Added Files:
	ImapUI.py ProxyUI.py UserInterface.py 
Log Message:
More modular user interface files (code mostly ripped out
of pop3proxy.py, as the comments indicate it will be).

--- NEW FILE: ImapUI.py ---
"""IMAPFilter Web Interface

Classes:
    IMAPUserInterface - Interface class for the IMAP filter

Abstract:

This module implements a browser based Spambayes user interface for the
IMAP filter.  Users may use it to interface with the filter - it is
expected that this will primarily be for configuration, although users
may also wish to look up words in the database, or classify a message.

The following functions are currently included:
[From the base class UserInterface]
  onClassify - classify a given message
  onWordquery - query a word from the database
  onTrain - train a message or mbox
  onSave - save the database and possibly shutdown
[Here]
  onHome - a home page with various options

To do:
 o There is a function to get a list of all the folders available on
   the server, but nothing is done with this.  Obviously what we would
   like is to present a page where the user selects (checkboxes) the
   folders that s/he wishes to filter, the folders s/he wishes to use
   as train-as-ham and train-as-spam, and (radio buttons) the folders
   to move suspected spam and unsures into.  I think this should be
   a separate page from the standard config as it's already going to
   be really big if there are lots of folders to choose from.
   An alternative design would be to have a single list of the folders
   and five columns - three of checkboxes (filter, train-as-spam and
   train-as-ham) and two of radio buttons (spam folder and ham folder).
   I think this might be more confusing, though.
 o This could have a neat review page, like pop3proxy, built up by
   asking the IMAP server appropriate questions.  I don't know whether
   this is needed, however.  This would then allow viewing a message,
   showing the clues for it, and so on.  Finding a message (via the
   spambayes id) could also be done.
 o Suggestions?
"""

# This module is part of the spambayes project, which is Copyright 2002-3
# The Python Software Foundation and is covered by the Python Software
# Foundation license.

__author__ = "Tony Meyer <ta-meyer at ihug.co.nz>, Tim Stone"
__credits__ = "All the Spambayes folk."

try:
    True, False
except NameError:
    # Maintain compatibility with Python 2.2
    True, False = 1, 0

import re

import UserInterface
from Options import options

global classifier

# This control dictionary maps http request parameters and template fields
# to ConfigParser sections and options.  The key matches both the input
# field that corresponds to a section/option, and also the HTML template
# variable that is used to display the value of that section/option.
parm_map = \
   {'hamcutoff':    ('Categorization',  'ham_cutoff'),
    'spamcutoff':   ('Categorization',  'spam_cutoff'),
    'dbname':       ('pop3proxy',       'persistent_storage_file'),
    'imapserver':   ('imap',            'server'),
    'imapport':     ('imap',            'port'),
    'imapusername': ('imap',            'username'),
    'imappassword': ('imap',            'password'),
    'p3notateto':   ('pop3proxy',       'notate_to'),
    'p3notatesub':  ('pop3proxy',       'notate_subject'),
    'p3addid':      ('pop3proxy',       'add_mailid_to'),
    'p3stripid':    ('pop3proxy',       'strip_incoming_mailids'),
    'p3prob':       ('pop3proxy',       'include_prob'),
    'p3thermostat': ('pop3proxy',       'include_thermostat'),
    'p3evidence':   ('pop3proxy',       'include_evidence'),
   }

display = ('IMAP Options', 'imapserver', 'imapport', 'imapusername',
           # to display, or not to display; that is the question
           # if we show this here, it's in plain text for everyone to
           # see (and worse - if we don't restrict connections to
           # localhost, it's available for the world to see)
           # on the other hand, we have to be able to enter it somehow...
           'imappassword',
           'Header Options', 'p3notateto', 'p3notatesub', 
           'p3prob', 'p3thermostat', 'p3evidence', 
           'p3addid', 'p3stripid',
           'Statistics Options', 'dbname', 'hamcutoff', 'spamcutoff')

class IMAPUserInterface(UserInterface.UserInterface):
    """Serves the HTML user interface for the proxies."""

    def __init__(self, cls, imap):
        global classifier
        UserInterface.UserInterface.__init__(self, cls, parm_map, display)
        classifier = cls
        self.imap = imap

    def onHome(self):
        """Serve up the homepage."""
        stateDict = classifier.__dict__.copy()
        stateDict.update(classifier.__dict__)
        statusTable = self.html.statusTable.clone()
        del statusTable.proxyDetails
        content = (self._buildBox('Status and Configuration',
                                  'status.gif', statusTable % stateDict)+
                   self._buildTrainBox() +
                   self._buildClassifyBox() +
                   self._buildBox('Word query', 'query.gif',
                                  self.html.wordQuery)
                   )
        self._writePreamble("Home")
        self.write(content)
        self._writePostamble()

    def reReadOptions(self):
        """Called by the config page when the user saves some new options, or
        restores the defaults."""
        # Reload the options.
        global classifier
        classifier.store()
        import Options
        reload(Options)
        global options
        from Options import options

    def _folder_list(self):
        '''Return a alphabetical list of all folders available
        on the server'''
        response = imap.list()
        if response[0] != "OK" return ()
        all_folders = response[1]
        folders = []
        for fol in all_folders:
            r = re.compile(r"\(([\w\\ ]*)\) ")
            m = r.search(fol)
            name_attributes = fol[:m.end()-1]
            folder_delimiter = fol[m.end()+1:m.end()+2]
            folders.append(fol[m.end()+5:-1])
        folders.sort()
        return folders

--- NEW FILE: ProxyUI.py ---
"""POP3Proxy and SMTPProxy Web Interface

Classes:
    ProxyUserInterface - Interface class for pop3proxy and smtpproxy

Abstract:

This module implements a browser based Spambayes user interface for the
POP3 proxy and SMTP proxy.  Users may use it to interface with the
proxies.

The following functions are currently included:
[From the base class UserInterface]
  onClassify - classify a given message
  onWordquery - query a word from the database
  onTrain - train a message or mbox
  onSave - save the database and possibly shutdown
[Here]
  onHome - a home page with various options
  onUpload - upload a message for later training (used by proxytee.py)
  onReview - show messages in corpii
  onView - view a message from one of the corpii
  onShowclues - show clues for a message

To do:

Web training interface:

 o Review already-trained messages, and purge them.
 o Put in a link to view a message (plain text, html, multipart...?)
   Include a Reply link that launches the registered email client, eg.
   mailto:tim at fourstonesExpressions.com?subject=Re:%20pop3proxy&body=Hi%21%0D
 o [Francois Granger] Show the raw spambrob number close to the buttons
   (this would mean using the extra X-Hammie header by default).
 o Add Today and Refresh buttons on the Review page.

User interface improvements:

 o Can it cleanly dynamically update its status display while having a POP3
   conversation?  Hammering reload sucks.

 o Suggestions?
"""

# This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.

# This module was once part of pop3proxy.py; if you are looking through
# the history of the file, you may need to go back there.

__author__ = "Richie Hindle <richie at entrian.com>"
__credits__ = "Tim Peters, Neale Pickett, Tim Stone, all the Spambayes folk."

try:
    True, False
except NameError:
    # Maintain compatibility with Python 2.2
    True, False = 1, 0

import re
import time
import bisect

import tokenizer
import UserInterface
from Options import options

global state

# This control dictionary maps http request parameters and template fields
# to ConfigParser sections and options.  The key matches both the input
# field that corresponds to a section/option, and also the HTML template
# variable that is used to display the value of that section/option.
parm_ini_map = \
   {'hamcutoff':    ('Categorization',  'ham_cutoff'),
    'spamcutoff':   ('Categorization',  'spam_cutoff'),
    'dbname':       ('pop3proxy',       'persistent_storage_file'),
    'p3servers':    ('pop3proxy',       'servers'),
    'p3ports':      ('pop3proxy',       'ports'),
    'p3notateto':   ('pop3proxy',       'notate_to'),
    'p3notatesub':  ('pop3proxy',       'notate_subject'),
    'p3cachemsg':   ('pop3proxy',       'cache_messages'),
    'p3addid':      ('pop3proxy',       'add_mailid_to'),
    'p3stripid':    ('pop3proxy',       'strip_incoming_mailids'),
    'p3prob':       ('pop3proxy',       'include_prob'),
    'p3thermostat': ('pop3proxy',       'include_thermostat'),
    'p3evidence':   ('pop3proxy',       'include_evidence'),
    'smtpservers':  ('smtpproxy',       'servers'),
    'smtpports':    ('smtpproxy',       'ports'),
    'smtpham':      ('smtpproxy',       'ham_address'),
    'smtpspam':     ('smtpproxy',       'spam_address'),
   }

display = ('POP3 Proxy Options', 'p3servers', 'p3ports', 'p3cachemsg',
           'Header Options', 'p3notateto', 'p3notatesub', 
           'p3prob', 'p3thermostat', 'p3evidence', 
           'p3addid', 'p3stripid',
           'SMTP Proxy Options', 'smtpservers', 'smtpports', 'smtpham',
           'smtpspam',
           'Statistics Options', 'dbname', 'hamcutoff', 'spamcutoff')


class ProxyUserInterface(UserInterface.UserInterface):
    """Serves the HTML user interface for the proxies."""

    def __init__(self, proxy_state, state_recreator):
        global state
        UserInterface.UserInterface.__init__(self, proxy_state.bayes,
                                             parm_ini_map, display)
        state = proxy_state
        self.state_recreator = state_recreator # ugly

    def onHome(self):
        """Serve up the homepage."""
        stateDict = state.__dict__.copy()
        stateDict.update(state.bayes.__dict__)
        statusTable = self.html.statusTable.clone()
        if not state.servers:
            statusTable.proxyDetails = "No POP3 proxies running."
        content = (self._buildBox('Status and Configuration',
                                  'status.gif', statusTable % stateDict)+
                   self._buildBox('Train on proxied messages',
                                  'train.gif', self.html.reviewText) +
                   self._buildTrainBox() +
                   self._buildClassifyBox() +
                   self._buildBox('Word query', 'query.gif',
                                  self.html.wordQuery) +
                   self._buildBox('Find message', 'query.gif',
                                  self.html.findMessage)
                   )
        self._writePreamble("Home")
        self.write(content)
        self._writePostamble()

    def onUpload(self, file):
        """Save a message for later training - used by Skip's proxytee.py."""
        # Convert platform-specific line endings into unix-style.
        file = file.replace('\r\n', '\n').replace('\r', '\n')

        # Get a message list from the upload and write it into the cache.
        messages = self._convertUploadToMessageList(file)
        for m in messages:
            messageName = state.getNewMessageName()
            message = state.unknownCorpus.makeMessage(messageName)
            message.setSubstance(m)
            state.unknownCorpus.addMessage(message)

        # Return a link Home.
        self.write("<p>OK. Return <a href='home'>Home</a>.</p>")

    def _keyToTimestamp(self, key):
        """Given a message key (as seen in a Corpus), returns the timestamp
        for that message.  This is the time that the message was received,
        not the Date header."""
        return long(key[:10])

    def _getTimeRange(self, timestamp):
        """Given a unix timestamp, returns a 3-tuple: the start timestamp
        of the given day, the end timestamp of the given day, and the
        formatted date of the given day."""
        # This probably works on Summertime-shift days; time will tell.  8-)
        this = time.localtime(timestamp)
        start = (this[0], this[1], this[2], 0, 0, 0, this[6], this[7], this[8])
        end = time.localtime(time.mktime(start) + 36*60*60)
        end = (end[0], end[1], end[2], 0, 0, 0, end[6], end[7], end[8])
        date = time.strftime("%A, %B %d, %Y", start)
        return time.mktime(start), time.mktime(end), date

    def _buildReviewKeys(self, timestamp):
        """Builds an ordered list of untrained message keys, ready for output
        in the Review list.  Returns a 5-tuple: the keys, the formatted date
        for the list (eg. "Friday, November 15, 2002"), the start of the prior
        page or zero if there isn't one, likewise the start of the given page,
        and likewise the start of the next page."""
        # Fetch all the message keys and sort them into timestamp order.
        allKeys = state.unknownCorpus.keys()
        allKeys.sort()

        # The default start timestamp is derived from the most recent message,
        # or the system time if there are no messages (not that it gets used).
        if not timestamp:
            if allKeys:
                timestamp = self._keyToTimestamp(allKeys[-1])
            else:
                timestamp = time.time()
        start, end, date = self._getTimeRange(timestamp)

        # Find the subset of the keys within this range.
        startKeyIndex = bisect.bisect(allKeys, "%d" % long(start))
        endKeyIndex = bisect.bisect(allKeys, "%d" % long(end))
        keys = allKeys[startKeyIndex:endKeyIndex]
        keys.reverse()

        # What timestamps to use for the prior and next days?  If there any
        # messages before/after this day's range, use the timestamps of those
        # messages - this will skip empty days.
        prior = end = 0
        if startKeyIndex != 0:
            prior = self._keyToTimestamp(allKeys[startKeyIndex-1])
        if endKeyIndex != len(allKeys):
            end = self._keyToTimestamp(allKeys[endKeyIndex])

        # Return the keys and their date.
        return keys, date, prior, start, end

    def _appendMessages(self, table, keyedMessageInfo, label):
        """Appends the rows of a table of messages to 'table'."""
        stripe = 0
        for key, messageInfo in keyedMessageInfo:
            row = self.html.reviewRow.clone()
            if label == 'Spam':
                row.spam.checked = 1
            elif label == 'Ham':
                row.ham.checked = 1
            else:
                row.defer.checked = 1
            row.subject = messageInfo.subjectHeader
            row.subject.title = messageInfo.bodySummary
            row.subject.href="view?key=%s&corpus=%s" % (key, label)
            row.from_ = messageInfo.fromHeader
            subj = cgi.escape(messageInfo.subjectHeader)
            row.classify.href="showclues?key=%s&subject=%s" % (key, subj)
            setattr(row, 'class', ['stripe_on', 'stripe_off'][stripe]) # Grr!
            row = str(row).replace('TYPE', label).replace('KEY', key)
            table += row
            stripe = stripe ^ 1

    def onReview(self, **params):
        """Present a list of message for (re)training."""
        # Train/discard sumbitted messages.
        self._writePreamble("Review")
        id = ''
        numTrained = 0
        numDeferred = 0
        for key, value in params.items():
            if key.startswith('classify:'):
                id = key.split(':')[2]
                if value == 'spam':
                    targetCorpus = state.spamCorpus
                elif value == 'ham':
                    targetCorpus = state.hamCorpus
                elif value == 'discard':
                    targetCorpus = None
                    try:
                        state.unknownCorpus.removeMessage(state.unknownCorpus[id])
                    except KeyError:
                        pass  # Must be a reload.
                else: # defer
                    targetCorpus = None
                    numDeferred += 1
                if targetCorpus:
                    sourceCorpus = None
                    if state.unknownCorpus.get(id) is not None:
                        sourceCorpus = state.unknownCorpus
                    elif state.hamCorpus.get(id) is not None:
                        sourceCorpus = state.hamCorpus
                    elif state.spamCorpus.get(id) is not None:
                        sourceCorpus = state.spamCorpus
                    if sourceCorpus is not None:
                        try:
                            targetCorpus.takeMessage(id, sourceCorpus)
                            if numTrained == 0:
                                self.write("<p><b>Training... ")
                                self.flush()
                            numTrained += 1
                        except KeyError:
                            pass  # Must be a reload.

        # Report on any training, and save the database if there was any.
        if numTrained > 0:
            plural = ''
            if numTrained != 1:
                plural = 's'
            self.write("Trained on %d message%s. " % (numTrained, plural))
            self._doSave()
            self.write("<br>&nbsp;")

        title = ""
        keys = []
        sourceCorpus = state.unknownCorpus
        # If any messages were deferred, show the same page again.
        if numDeferred > 0:
            start = self._keyToTimestamp(id)

        # Else after submitting a whole page, display the prior page or the
        # next one.  Derive the day of the submitted page from the ID of the
        # last processed message.
        elif id:
            start = self._keyToTimestamp(id)
            unused, unused, prior, unused, next = self._buildReviewKeys(start)
            if prior:
                start = prior
            else:
                start = next

        # Else if they've hit Previous or Next, display that page.
        elif params.get('go') == 'Next day':
            start = self._keyToTimestamp(params['next'])
        elif params.get('go') == 'Previous day':
            start = self._keyToTimestamp(params['prior'])

        # Else if an id has been specified, just show that message
        elif params.get('find') is not None:
            key = params['find']
            error = False
            if key == "":
                error = True
                page = "<p>You must enter an id to find.</p>"
            elif state.unknownCorpus.get(key) == None:
                # maybe this message has been moved to the spam
                # or ham corpus
                if state.hamCorpus.get(key) != None:
                    sourceCorpus = state.hamCorpus
                elif state.spamCorpus.get(key) != None:
                    sourceCorpus = state.spamCorpus
                else:
                    error = True
                    page = "<p>Could not find message with id '"
                    page += key + "' - maybe it expired.</p>"
            if error == True:
                title = "Did not find message"
                box = self._buildBox(title, 'status.gif', page)
                self.write(box)
                self.write(self._buildBox('Find message', 'query.gif',
                                          self.html.findMessage))
                self._writePostamble()
                return
            keys.append(params['find'])
            prior = this = next = 0
            title = "Found message"

        # Else show the most recent day's page, as decided by _buildReviewKeys.
        else:
            start = 0

        # Build the lists of messages: spams, hams and unsure.
        if len(keys) == 0:
            keys, date, prior, this, next = self._buildReviewKeys(start)
        keyedMessageInfo = {options.header_spam_string: [],
                            options.header_ham_string: [],
                            options.header_unsure_string: []}
        for key in keys:
            # Parse the message, get the judgement header and build a message
            # info object for each message.
            cachedMessage = sourceCorpus[key]
            message = mboxutils.get_message(cachedMessage.getSubstance())
            judgement = message[options.hammie_header_name]
            if judgement is None:
                judgement = options.header_unsure_string
            else:
                judgement = judgement.split(';')[0].strip()
            messageInfo = self._makeMessageInfo(message)
            keyedMessageInfo[judgement].append((key, messageInfo))

        # Present the list of messages in their groups in reverse order of
        # appearance.
        if keys:
            page = self.html.reviewtable.clone()
            if prior:
                page.prior.value = prior
                del page.priorButton.disabled
            if next:
                page.next.value = next
                del page.nextButton.disabled
            templateRow = page.reviewRow.clone()
            page.table = ""  # To make way for the real rows.
            for header, label in ((options.header_spam_string, 'Spam'),
                                  (options.header_ham_string, 'Ham'),
                                  (options.header_unsure_string, 'Unsure')):
                messages = keyedMessageInfo[header]
                if messages:
                    subHeader = str(self.html.reviewSubHeader)
                    subHeader = subHeader.replace('TYPE', label)
                    page.table += self.html.blankRow
                    page.table += subHeader
                    self._appendMessages(page.table, messages, label)

            page.table += self.html.trainRow
            if title == "":
                title = "Untrained messages received on %s" % date
            box = self._buildBox(title, None, page)  # No icon, to save space.
        else:
            page = "<p>There are no untrained messages to display. "
            page += "Return <a href='home'>Home</a>.</p>"
            title = "No untrained messages"
            box = self._buildBox(title, 'status.gif', page)

        self.write(box)
        self._writePostamble()

    def onView(self, key, corpus):
        """View a message - linked from the Review page."""
        self._writePreamble("View message", parent=('review', 'Review'))
        message = state.unknownCorpus.get(key)
        if message:
            self.write("<pre>%s</pre>" % cgi.escape(message.getSubstance()))
        else:
            self.write("<p>Can't find message %r. Maybe it expired.</p>" % key)
        self._writePostamble()

    def onShowclues(self, key, subject):
        """Show clues for a message - linked from the Review page."""
        self._writePreamble("Message clues", parent=('review', 'Review'))
        message = state.unknownCorpus.get(key).getSubstance()
        message = message.replace('\r\n', '\n').replace('\r', '\n') # For Macs
        if message:
            results = self._buildCluesTable(message, subject)
            del results.classifyAnother
            self.write(results)
        else:
            self.write("<p>Can't find message %r. Maybe it expired.</p>" % key)
        self._writePostamble()

    def _makeMessageInfo(self, message):
        """Given an email.Message, return an object with subjectHeader,
        fromHeader and bodySummary attributes.  These objects are passed into
        appendMessages by onReview - passing email.Message objects directly
        uses too much memory."""
        subjectHeader = message["Subject"] or "(none)"
        fromHeader = message["From"] or "(none)"
        try:
            part = typed_subpart_iterator(message, 'text', 'plain').next()
            text = part.get_payload()
        except StopIteration:
            try:
                part = typed_subpart_iterator(message, 'text', 'html').next()
                text = part.get_payload()
                text, unused = tokenizer.crack_html_style(text)
                text, unused = tokenizer.crack_html_comment(text)
                text = tokenizer.html_re.sub(' ', text)
                text = '(this message only has an HTML body)\n' + text
            except StopIteration:
                text = '(this message has no text body)'
        if type(text) == type([]):  # gotta be a 'right' way to do this
            text = "(this message is a digest of %s messages)" % (len(text))
        else:
            text = text.replace('&nbsp;', ' ')      # Else they'll be quoted
            text = re.sub(r'(\s)\s+', r'\1', text)  # Eg. multiple blank lines
            text = text.strip()

        class _MessageInfo:
            pass
        messageInfo = _MessageInfo()
        messageInfo.subjectHeader = self._trimHeader(subjectHeader, 50, True)
        messageInfo.fromHeader = self._trimHeader(fromHeader, 40, True)
        messageInfo.bodySummary = self._trimHeader(text, 200)
        return messageInfo

    def reReadOptions(self):
        """Called by the config page when the user saves some new options, or
        restores the defaults."""
        # Reload the options.
        global state
        state.bayes.store()
        import Options
        reload(Options)
        global options
        from Options import options

        # Recreate the state.
        self.state_recreator()

--- NEW FILE: UserInterface.py ---
"""Web User Interface

Classes:
    UserInterfaceServer - Implements the web server component
                          via a Dibbler plugin.
    BaseUserInterface - Just has utilities for creating boxes and so forth.
                        (Does not include any pages)
    UserInterface - A base class for Spambayes web user interfaces.

Abstract:

This module implements a browser based Spambayes user interface.  Users can
*not* use this class (there is no 'home' page), but developments should
sub-class it to provide an appropriate interface for their application.

Functions deemed appropriate for all application interfaces are included.
These currently include:
  onClassify - classify a given message
  onWordquery - query a word from the database
  onTrain - train a message or mbox
  onSave - save the database and possibly shutdown
  onConfig - present the appropriate configuration page

To Do:

Web training interface:

 o Functional tests.
 o Keyboard navigation (David Ascher).  But aren't Tab and left/right
   arrow enough?


User interface improvements:

 o Once the pieces are on separate pages, make the paste box bigger.
 o Deployment: Windows executable?  atlaxwin and ctypes?  Or just
   webbrowser?
 o Save the stats (num classified, etc.) between sessions.
 o "Reload database" button.
 o Checkboxes need a default value (i.e. what to set the option as
   when no boxes are checked).  This needs to be thought about and
   then implemented.  add_id is an example of what it does at the
   moment.

 o Suggestions?

"""

# This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.

# This module was once part of pop3proxy.py; if you are looking through
# the history of the file, you may need to go back there.
# The options/configuration section started life in OptionConfig.py.
# You can find this file in the cvs attic if you want to trawl through
# its history.

__author__ = """Richie Hindle <richie at entrian.com>,
                Tim Stone <tim at fourstonesExpressions.com>"""
__credits__ = "Tim Peters, Neale Pickett, Tony Meyer, all the Spambayes folk."

try:
    True, False
except NameError:
    # Maintain compatibility with Python 2.2
    True, False = 1, 0

import re
import time
import email
import binascii
import cgi
import mailbox

import PyMeldLite
import Dibbler
import tokenizer
from Options import options, optionsPathname, defaults

IMAGES = ('helmet', 'status', 'config',
          'message', 'train', 'classify', 'query')

global classifier

class UserInterfaceServer(Dibbler.HTTPServer):
    """Implements the web server component via a Dibbler plugin."""

    def __init__(self, uiPort):
        Dibbler.HTTPServer.__init__(self, uiPort)
        print 'User interface url is http://localhost:%d/' % (uiPort)


class BaseUserInterface(Dibbler.HTTPPlugin):
    def __init__(self):
        Dibbler.HTTPPlugin.__init__(self)
        htmlSource, self._images = self.readUIResources()
        self.html = PyMeldLite.Meld(htmlSource, readonly=True)
  
    def onIncomingConnection(self, clientSocket):
        """Checks the security settings."""
        return options.html_ui_allow_remote_connections or \
               clientSocket.getpeername()[0] == clientSocket.getsockname()[0]

    def _writePreamble(self, name, parent=None, showImage=True):
        """Writes the HTML for the beginning of a page - time-consuming
        methlets use this and `_writePostamble` to write the page in
        pieces, including progress messages.  `parent` (if given) should
        be a pair: `(url, label)`, eg. `('review', 'Review')`."""

        # Take the whole palette and remove the content and the footer,
        # leaving the header and an empty body.
        html = self.html.clone()
        html.mainContent = " "
        del html.footer

        # Add in the name of the page and remove the link to Home if this
        # *is* Home.
        html.title = name
        if name == 'Home':
            del html.homelink
            html.pagename = "Home"
        elif parent:
            html.pagename = "> <a href='%s'>%s</a> > %s" % \
                            (parent[0], parent[1], name)
        else:
            html.pagename = "> " + name

        # Remove the helmet image if we're not showing it - this happens on
        # shutdown because the browser might ask for the image after we've
        # exited.
        if not showImage:
            del html.helmet

        # Strip the closing tags, so we push as far as the start of the main
        # content.  We'll push the closing tags at the end.
        self.writeOKHeaders('text/html')
        self.write(re.sub(r'</div>\s*</body>\s*</html>', '', str(html)))

    def _writePostamble(self):
        """Writes the end of time-consuming pages - see `_writePreamble`."""
        footer = self.html.footer.clone()
        footer.timestamp = time.asctime(time.localtime())
        self.write("</div>" + self.html.footer)
        self.write("</body></html>")

    def _trimHeader(self, field, limit, quote=False):
        """Trims a string, adding an ellipsis if necessary and HTML-quoting
        on request.  Also pumps it through email.Header.decode_header, which
        understands charset sections in email headers - I suspect this will
        only work for Latin character sets, but hey, it works for Francois
        Granger's name.  8-)"""

        try:
            sections = email.Header.decode_header(field)
        except (binascii.Error, email.Errors.HeaderParseError):
            sections = [(field, None)]
        field = ' '.join([text for text, unused in sections])
        if len(field) > limit:
            field = field[:limit-3] + "..."
        if quote:
            field = cgi.escape(field)
        return field

    def onHome(self):
        """Serve up the homepage."""
        raise NotImplementedError

    def _writeImage(self, image):
        self.writeOKHeaders('image/gif')
        self.write(self._images[image])

    # If you are easily offended, look away now...
    for imageName in IMAGES:
        exec "def %s(self): self._writeImage('%s')" % \
             ("on%sGif" % imageName.capitalize(), imageName)

    def _buildBox(self, heading, icon, content):
        """Builds a yellow-headed HTML box."""
        box = self.html.headedBox.clone()
        box.heading = heading
        if icon:
            box.icon.src = icon
        else:
            del box.iconCell
        box.boxContent = content
        return box

    def readUIResources(self):
        """Returns ui.html and a dictionary of Gifs."""

        # Using `exec` is nasty, but I couldn't figure out a way of making
        # `getattr` or `__import__` work with ResourcePackage.
        from spambayes.resources import ui_html
        images = {}
        for baseName in IMAGES:
            moduleName = '%s.%s_gif' % ('spambayes.resources', baseName)
            module = __import__(moduleName, {}, {}, ('spambayes', 'resources'))
            images[baseName] = module.data
        return ui_html.data, images


class UserInterface(BaseUserInterface):
    """Serves the HTML user interface."""

    def __init__(self, bayes, config_parms=[], config_display=[]):
        """Load up the necessary resources: ui.html and helmet.gif."""
        global classifier
        BaseUserInterface.__init__(self)
        classifier = bayes
        self.parm_ini_map = config_parms
        self.display = config_display

    def onClassify(self, file, text, which):
        """Classify an uploaded or pasted message."""
        message = file or text
        message = message.replace('\r\n', '\n').replace('\r', '\n') # For Macs
        results = self._buildCluesTable(message)
        results.classifyAnother = self._buildClassifyBox()
        self._writePreamble("Classify")
        self.write(results)
        self._writePostamble()

    def _buildCluesTable(self, message, subject=None):
        cluesTable = self.html.cluesTable.clone()
        cluesRow = cluesTable.cluesRow.clone()
        del cluesTable.cluesRow   # Delete dummy row to make way for real ones
        (probability, clues) = classifier.spamprob(tokenizer.tokenize(message),\
                                                    evidence=True)
        for word, wordProb in clues:
            cluesTable += cluesRow % (cgi.escape(word), wordProb)

        results = self.html.classifyResults.clone()
        results.probability = probability
        if subject is None:
            heading = "Clues:"
        else:
            heading = "Clues for: " + subject
        results.cluesBox = self._buildBox(heading, 'status.gif', cluesTable)
        return results

    def onWordquery(self, word):
        if word == "":
            stats = "You must enter a word."
        else:
            word = word.lower()
            wordinfo = classifier._wordinfoget(word)
            if wordinfo:
                stats = self.html.wordStats.clone()
                stats.spamcount = wordinfo.spamcount
                stats.hamcount = wordinfo.hamcount
                stats.spamprob = classifier.probability(wordinfo)
            else:
                stats = "%r does not exist in the database." % cgi.escape(word)

        query = self.html.wordQuery.clone()
        query.word.value = word
        statsBox = self._buildBox("Statistics for %r" % cgi.escape(word),
                                  'status.gif', stats)
        queryBox = self._buildBox("Word query", 'query.gif', query)
        self._writePreamble("Word query")
        self.write(statsBox + queryBox)
        self._writePostamble()

    def onTrain(self, file, text, which):
        """Train on an uploaded or pasted message."""
        self._writePreamble("Train")

        # Upload or paste?  Spam or ham?
        content = file or text
        isSpam = (which == 'Train as Spam')

        # Convert platform-specific line endings into unix-style.
        content = content.replace('\r\n', '\n').replace('\r', '\n')

        # The upload might be a single message or am mbox file.
        messages = self._convertUploadToMessageList(content)

        # Append the message(s) to a file, to make it easier to rebuild
        # the database later.   This is a temporary implementation -
        # it should keep a Corpus of trained messages.
        if isSpam:
            f = open("_pop3proxyspam.mbox", "a")
        else:
            f = open("_pop3proxyham.mbox", "a")

        # Train on the uploaded message(s).
        self.write("<b>Training...</b>\n")
        self.flush()
        for message in messages:
            tokens = tokenizer.tokenize(message)
            classifier.learn(tokens, isSpam)
            f.write("From pop3proxy at spambayes.org Sat Jan 31 00:00:00 2000\n")
            f.write(message)
            f.write("\n\n")

        # Save the database and return a link Home and another training form.
        f.close()
        self._doSave()
        self.write("<p>OK. Return <a href='home'>Home</a> or train again:</p>")
        self.write(self._buildTrainBox())
        self._writePostamble()

    def _convertUploadToMessageList(self, content):
        """Returns a list of raw messages extracted from uploaded content.
        You can upload either a single message or an mbox file."""
        if content.startswith('From '):
            # Get a list of raw messages from the mbox content.
            class SimpleMessage:
                def __init__(self, fp):
                    self.guts = fp.read()
            contentFile = StringIO.StringIO(content)
            mbox = mailbox.PortableUnixMailbox(contentFile, SimpleMessage)
            return map(lambda m: m.guts, mbox)
        else:
            # Just the one message.
            return [content]

    def _doSave(self):
        """Saves the database."""
        self.write("<b>Saving... ")
        self.flush()
        classifier.store()
        self.write("Done</b>.\n")

    def onSave(self, how):
        """Command handler for "Save" and "Save & shutdown"."""
        isShutdown = how.lower().find('shutdown') >= 0
        self._writePreamble("Save", showImage=(not isShutdown))
        self._doSave()
        if isShutdown:
            self.write("<p>%s</p>" % self.html.shutdownMessage)
            self.write("</div></body></html>")
            self.flush()
            ## Is this still required?: self.shutdown(2)
            self.close()
            raise SystemExit
        self._writePostamble()

    def _buildClassifyBox(self):
        """Returns a "Classify a message" box.  This is used on both the Home
        page and the classify results page.  The Classify form is based on the
        Upload form."""

        form = self.html.upload.clone()
        del form.or_mbox
        del form.submit_spam
        del form.submit_ham
        form.action = "classify"
        return self._buildBox("Classify a message", 'classify.gif', form)

    def _buildTrainBox(self):
        """Returns a "Train on a given message" box.  This is used on both
        the Home page and the training results page.  The Train form is
        based on the Upload form."""

        form = self.html.upload.clone()
        del form.submit_classify
        return self._buildBox("Train on a given message", 'message.gif', form)

    def reReadOptions(self):
        """Called by the config page when the user saves some new options,
        or restores the defaults."""
        pass

    def onConfig(self):
        # Start with an empty config form then add the sections.
        html = self.html.clone()
        # "Save and Shutdown" is confusing here - it means "Save database"
        # but that's not clear.
        html.shutdownTableCell = "&nbsp;"
        html.mainContent = self.html.configForm.clone()
        html.mainContent.configFormContent = ""
        html.mainContent.optionsPathname = optionsPathname
        configTable = None
        section = None

        # Loop though the sections.
        for html_key in self.display:
            if not self.parm_ini_map.has_key(html_key):
                if configTable is not None and section is not None:
                    # Finish off the box for this section and add it
                    # to the form.
                    section.boxContent = configTable
                    html.configFormContent += section
                # Start the yellow-headed box for this section.
                section = self.html.headedBox.clone()
                # Get a clone of the config table and a clone of each
                # example row, then blank out the example rows to make way
                # for the real ones.
                configTable = self.html.configTable.clone()
                configTextRow1 = configTable.configTextRow1.clone()
                configCbRow1 = configTable.configCbRow1.clone()
                configRow2 = configTable.configRow2.clone()
                blankRow = configTable.blankRow.clone()
                del configTable.configTextRow1
                del configTable.configCbRow1
                del configTable.configRow2
                del configTable.blankRow
                section.heading = html_key
                del section.iconCell
                continue
            (sect, opt) = self.parm_ini_map[html_key]

            # Populate the rows with the details and add them to the table.
            if type(options.valid_input(sect, opt)) == type(""):
                # we provide a text input
                newConfigRow1 = configTextRow1.clone()
                newConfigRow1.label = options.display_name(sect, opt)
                newConfigRow1.input.name = html_key
                newConfigRow1.input.value = options.get(sect, opt)
            else:
                # we provide checkboxes/radio buttons
                newConfigRow1 = configCbRow1.clone()
                newConfigRow1.label = options.display_name(sect, opt)
                blankOption = newConfigRow1.input.clone()
                firstOpt = True
                i = 0
                for val in options.valid_input(sect, opt):
                    newOption = blankOption.clone()
                    if str(val) in str(options[sect, opt]).split():
                        newOption.input_box.checked = "checked" 
                    # help for Python 2.2
                    if options.is_boolean(sect, opt):
                        if str(val) == "0":
                            val = "False"
                        elif str(val) == "1":
                            val = "True"
                    newOption.val_label = str(val)
                    if options.multiple_values_allowed(sect, opt):
                        newOption.input_box.type = "checkbox"
                        newOption.input_box.name = html_key + '-' + str(i)
                        i += 1
                    else:
                        newOption.input_box.type = "radio"
                        newOption.input_box.name = html_key
                    newOption.input_box.value = str(val)
                    if firstOpt: 
                        newConfigRow1.input = newOption
                        firstOpt = False
                    else:                   
                        newConfigRow1.input += newOption
            # Insert the help text in a cell
            newConfigRow1.helpCell = '<strong>' + \
                                     options.display_name(sect, opt) + \
                                     ':</strong> ' + \
                                     cgi.escape(options.doc(sect, opt))

            newConfigRow2 = configRow2.clone()
            currentValue = options[sect, opt]
            # for Python 2.2
            if options.is_boolean(sect, opt):
                if str(currentValue) == '0':
                    currentValue = "False"
                elif str(currentValue) == '1':
                    currentValue = "True"
            newConfigRow2.currentValue = currentValue
            configTable += newConfigRow1 + newConfigRow2 + blankRow

        # Finish off the box for this section and add it to the form.
        if section is not None:
            section.boxContent = configTable
            html.configFormContent += section
        html.title = 'Home &gt; Configure'
        html.pagename = '&gt; Configure'
        self.writeOKHeaders('text/html')
        self.write(html)

    def onChangeopts(self, **parms):
        html = self.html.clone()
        html.shutdownTableCell = "&nbsp;"
        html.mainContent = self.html.headedBox.clone()
        errmsg = self.verifyInput(parms)
        if errmsg != '':
            html.mainContent.heading = "Errors Detected"
            html.mainContent.boxContent = errmsg
            html.title = 'Home &gt; Error'
            html.pagename = '&gt; Error'
            self.writeOKHeaders('text/html')
            self.write(html)
            return

        for name, value in parms.items():
           if self.parm_ini_map.has_key(name):
               sect, opt = self.parm_ini_map[name]
               options.set(sect, opt, value)

        op = open(optionsPathname, "r")
        options.update_file(op)
        op.close()
        self.reReadOptions()

        html.mainContent.heading = "Options Changed"
        html.mainContent.boxContent = "%s.  Return <a href='home'>Home</a>." \
                                      % "Options changed"
        html.title = 'Home &gt; Options Changed'
        html.pagename = '&gt; Options Changed'
        self.writeOKHeaders('text/html')
        self.write(html)

    def onRestoredefaults(self, how):
        self.restoreConfigDefaults()
        self.reReadOptions()

        html = self.html.clone()
        html.shutdownTableCell = "&nbsp;"
        html.mainContent = self.html.headedBox.clone()
        html.mainContent.heading = "Option Defaults Restored"
        html.mainContent.boxContent = "%s.  Return <a href='home'>Home</a>." \
                                      % "Defaults restored"
        html.title = 'Home &gt; Defaults Restored'
        html.pagename = '&gt; Defaults Restored'
        self.writeOKHeaders('text/html')
        self.write(html)

    def verifyInput(self, parms):
        '''Check that the given input is valid.'''
        # Most of the work here is done by the options class, but
        # we have a few extra checks that are beyond its capabilities
        errmsg = ''

        # mumbo-jumbo to deal with the checkboxes
        # XXX This will break with more than 9 checkboxes
        # XXX A better solution is needed than this
        for name, value in parms.items():
            if name[-2:-1] == '-':
                if parms.has_key(name[:-2]):
                    parms[name[:-2]].append(value)
                else:
                    parms[name[:-2]] = (value,)
                del parms[name]

        for html_key in self.display:
            if not self.parm_ini_map.has_key(html_key):
                nice_section_name = html_key
                continue
            sect, opt = self.parm_ini_map[html_key]
            if not parms.has_key(html_key):
                # This is a set of checkboxes where none are selected
                value = None
            else:
                value = parms[html_key]
            if value is not None:
                if type(value) == type((0,1)):
                    value_string = ""
                    for val in value:
                        value_string += val
                        value_string += ','
                    value = value_string[:-1]
                value = options.convert(sect, opt, value)
            if not options.is_valid(sect, opt, value):
                errmsg += '<li>\'%s\' is not a value valid for [%s] %s' % \
                          (value, nice_section_name,
                           options.display_name(sect, opt))
                if type(options.valid_input(sect, opt)) == type((0,1)):
                    errmsg += '. Valid values are: '
                    for valid in options.valid_input(sect, opt):
                        errmsg += str(valid) + ','
                    errmsg = errmsg[:-1] # cut last ','
                errmsg += '</li>'
            parms[html_key] = value

        # check for equal number of pop3servers and ports
        slist = parms['p3servers'].split(',')
        plist = parms['p3ports'].split(',')
        if len(slist) != len(plist):
            errmsg += '<li>The number of POP3 proxy ports specified ' + \
                      'must match the number of servers specified</li>\n'

        # check for duplicate ports
        plist.sort()
        for p in range(len(plist)-1):
            try:
                if plist[p] == plist[p+1]:
                    errmsg += '<li>All POP3 port numbers must be unique</li>'
                    break
            except IndexError:
                pass

        # check for equal number of smtpservers and ports
        slist = parms['smtpservers'].split(',')
        plist = parms['smtpports'].split(',')
        if len(slist) != len(plist):
            errmsg += '<li>The number of SMTP proxy ports specified ' + \
                      'must match the number of servers specified</li>\n'

        # check for duplicate ports
        plist.sort()
        for p in range(len(plist)-1):
            try:
                if plist[p] == plist[p+1]:
                    errmsg += '<li>All SMTP port numbers must be unique</li>'
                    break
            except IndexError:
                pass

        return errmsg

    def restoreConfigDefaults(self):
        # note that the behaviour of this function has subtly changed
        # previously options were removed from the config file, now the
        # config file is updated to match the defaults
        c = ConfigParser()
        d = StringIO(defaults)
        c.readfp(d)
        del d

        # Only restore the settings that appear on the form.
        for section, option in self.parm_ini_map.values():
            if not options.no_restore(section, option):
                options.set(section, option, c.get(section,option))

        op = open(optionsPathname, "r")
        options.update_file(op)
        op.close()

Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.7
retrieving revision 1.8
diff -C2 -d -r1.7 -r1.8
*** tokenizer.py	6 Mar 2003 15:47:19 -0000	1.7
--- tokenizer.py	18 Apr 2003 09:24:29 -0000	1.8
***************
*** 17,21 ****
      from sets import Set
  except ImportError:
!     from spambayes.compatsets import Set
  
  
--- 17,21 ----
      from sets import Set
  except ImportError:
!     from compatsets import Set
  
  





More information about the Spambayes-checkins mailing list