[Spambayes] Adding a message database
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Wed Mar 5 23:08:22 EST 2003
3/5/2003 7:35:55 PM, "Mark Hammond" <mhammond at skippinet.com.au> wrote:
>It seems to me that sub-classing classifier to change storage semantics is
>wrong. IMO, this should use delegation. sub-classing of classifier should
>be used should the classification sheme want overriding, not the storage
>requirements.
Yes, I agree with this. I think the same kind of argument applies to the
message id database thing. Assuming that there is a classifier subclass to
manage message ids seems wrong. And while I am here... ;) assuming that
classifier will be subclassed as some kind of persistent classifier seems
wrong to me, too.
>
>This wouldn't be too hard to do - _setwordinfo() etc just delegate to a
>self.storage - and would make some sense to do as part of a "message
>database".
I wonder if delegate is the right pattern here. Perhaps observer?
>
>If there a compelling reason for it being the way it is?
Nope.
So... let's consider a strawman like this:
class Classifier:
def __init__(self, wi):
self.wordinfo = wi()
class WordInfo:
""" In memory wordinfo class """
class PersistentWordInfo(WordInfo):
""" Implements persistence as dbdict, let's forget pickles."""
class Message:
""" Message abstraction """
def __init__(self, id)
""" All messages have an id """
if id is None:
self.id = time() # make up some arbitrary id
else:
self.id = id
def setPayload(self, payload)
""" payload is delivered to an email.Message object """
self.msg = email.Message()
self.msg.add_payload(payload)
""" have appropriate delegators to the Message object """
class FileMessage(Message):
""" Message stored in a file system """
class MboxMessage(Message):
""" Message stored in an mbox """
""" Perhaps other Message classes for various mechanisms, like Outlook,
Lotus, etc."""
class MessageSet:
""" Iterable set of Message objects """
class FileMessageSet:
""" Set of Messages in the file system """
class MboxMessageSet(MessageSet):
""" Set of Messages in an mbox """
""" Perhaps other MessageSet classes for various mechanisms, like Outlook,
Lotus, etc. """
class Trainer:
def __init__(self, wordinfo, idDb):
""" Trains. Some methods in this class will come from current
classifier class. """
self.wordinfo = wordinfo
self.idDb = idDb
def learn(self, msg, isSpam):
""" unlearns if need be, then learns a message. """
try:
mstat = idDb.isSpam(msg)
except NeverTrainedError:
pass
else:
if isSpam != mstat
self.unlearn(msg, not mstat)
wordinfo.learn(msg, isSpam) # you get the idea
def unlearn(self, msg, isSpam):
""" unlearn previous training """
wordinfo.unlearn(msg, isSpam)
class MessageIdDb:
""" Maintains a persistent set of message ids and how they've been
trained"""
def __init__(self, dbname):
""" Assumes a particular persistence mechanism (pickle, bsddb,
whatever)"""
self.dbname = dbname
# do something to load
def rememberSpam(id):
def rememberHam(id):
def isSpam(id):
""" Iteratable? """
Rip away, dudes... :)
c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org
More information about the Spambayes
mailing list