[Spambayes] Using SpamBayes as a "remote filter"?

Tony Meyer tameyer at ihug.co.nz
Wed May 5 17:27:12 EDT 2004


> However: I have to travel a lot and so I'm sometimes 
> connecting with my modem, which means that I have to transfer 
> all spam to my notebook first.

This is a reasonably common situation, and there was even talk of
creating/adapting a script to handle this a while back.  Nothing (AFAIK) has
been done, though.

> I got a Linux (debian) server with a DSL connection. Is it 
> possible to use the Unix version of SpamBayes to check a 
> pop-mailbox, remove all spam mails from the mailbox and keep 
> the ham on the pop server of my provider?

Possible, yes.  With the existing scripts, no.  Any chance that you know
Python?  This would be reasonably simple to write.  Something like:

  * Use poplib to connect to the POP server.
  * Run through all the messages (you could later adapt this to look only at
ones you haven't seen before).
    * For each message (, call
spambayes.classifier.spamprob(spambayes.tokenizer.tokenize(messagetext)),
which will give you the message's score.
    * If the score is > spambayes.Options.options["Classification",
"spam_threshold"] then:
       * Save the message somewhere for later review.
       * Delete the message.
  * Repeat this script at whatever regularity is required.

This isn't tested, but would be a start:

"""
import os
import time
import poplib
from spambayes.tokenizer import tokenize
from spambayes.storage import open_storage
from spambayes.Options import options

#######################
# These need to be initialised to whatever is correct.
review_path = "~/review"
SERVER = "pop.example.com"
PORT = 110
USERNAME = "user"
PASSWORD = "pass"
#######################

classifier = open_storage(options["Storage", "persistent_storage_file"],
                          options["Storage", "persistent_use_database"])
review_path = os.expanduser(review_path)
if not os.path.exists(review_path):
    print "Making review directory"
    os.mkdirs(review_path)

spamcount = 0
p = poplib.POP3(SERVER, PORT)
p.user(USERNAME)
p.pass_(PASSWORD)
for msg in p.list()[1]:
    msg_num, msg_size = msg.split()
    messagetext = p.retr(msg_num)
    score = classifier.spamprob(tokenize(messagetext))
    if score > spambayes.Options.options["Classification",
"spam_threshold"]:
        fn = os.path.join(review_path, "%d_%10d" % (spamcount, time.time()))
        f = file(fn, "w")
        f.write(messagetext)
        f.close()
        p.dele(msg_num)
        spamcount += 1
p.quit()
print "Removed", spamcount, "messages."
"""

> And: Is it possible to transfer the database of my existing 
> Windows installation to the new Unix installation by copying 
> the database files?

Transferring the databases is reasonably straightforward.  You *might* be
able to simply copy the database files - if you're using a pickle, that
would be fine.  If you're using one of the dbm modules (the default), then
it might work, depending on the various versions on the Windows and Unix
systems.

However, to get around this, you can use the sb_dbexpimp.py script.  Use it
to convert the database (the statistic - hamme.db - one - the other one
doesn't need to be copied) either to a pickle or to csv with the Windows
system.  Then on the Unix system use the same script to convert from
csv/pickle to dbm.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list