[Spambayes] Adding a message database

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Wed Mar 5 23:08:22 EST 2003


3/5/2003 7:35:55 PM, "Mark Hammond" <mhammond at skippinet.com.au> wrote:

>It seems to me that sub-classing classifier to change storage semantics is
>wrong.  IMO, this should use delegation.  sub-classing of classifier should
>be used should the classification sheme want overriding, not the storage
>requirements.

Yes, I agree with this.  I think the same kind of argument applies to the 
message id database thing.  Assuming that there is a classifier subclass to 
manage message ids seems wrong.  And while I am here... ;)  assuming that 
classifier will be subclassed as some kind of persistent classifier seems 
wrong to me, too.

>
>This wouldn't be too hard to do - _setwordinfo() etc just delegate to a
>self.storage - and would make some sense to do as part of a "message
>database".

I wonder if delegate is the right pattern here.  Perhaps observer?

>
>If there a compelling reason for it being the way it is?

Nope.

So... let's consider a strawman like this:


class Classifier:

    def __init__(self, wi):

        self.wordinfo = wi()

class WordInfo:
    """ In memory wordinfo class """

class PersistentWordInfo(WordInfo):
    """ Implements persistence as dbdict, let's forget pickles."""

class Message:
    """ Message abstraction """

    def __init__(self, id)
        """ All messages have an id """

        if id is None:
            self.id = time()  # make up some arbitrary id
        else:
            self.id = id

    def setPayload(self, payload)
        """ payload is delivered to an email.Message object """

        self.msg = email.Message()
        self.msg.add_payload(payload)

    """ have appropriate delegators to the Message object """

class FileMessage(Message):
    """ Message stored in a file system """

class MboxMessage(Message):
    """ Message stored in an mbox """

""" Perhaps other Message classes for various mechanisms, like Outlook,
Lotus, etc."""

class MessageSet:
    """ Iterable set of Message objects """

class FileMessageSet:
    """ Set of Messages in the file system """

class MboxMessageSet(MessageSet):
    """ Set of Messages in an mbox """

""" Perhaps other MessageSet classes for various mechanisms, like Outlook, 
Lotus, etc. """

class Trainer:
    def __init__(self, wordinfo, idDb):
        """ Trains.  Some methods in this class will come from current 
classifier class. """

        self.wordinfo = wordinfo
        self.idDb = idDb

    def learn(self, msg, isSpam):
        """ unlearns if need be, then learns a message. """

        try:
            mstat = idDb.isSpam(msg)
        except NeverTrainedError:
            pass
        else:
            if isSpam != mstat
            self.unlearn(msg, not mstat)

        wordinfo.learn(msg, isSpam)  # you get the idea
            
    def unlearn(self, msg, isSpam):
        """ unlearn previous training """

        wordinfo.unlearn(msg, isSpam)

class MessageIdDb:
    """ Maintains a persistent set of message ids and how they've been 
trained"""

    def __init__(self, dbname):
        """ Assumes a particular persistence mechanism (pickle, bsddb, 
whatever)"""

        self.dbname = dbname
        # do something to load

    def rememberSpam(id):

    def rememberHam(id):

    def isSpam(id):

    """ Iteratable? """

Rip away, dudes... :)

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org





More information about the Spambayes mailing list