Spambayes + HTTP proxy server

Skip Montanaro skip at pobox.com
Sun Feb 2 13:17:23 EST 2003


    >> Perfectly workable, though it would probably require some tweaks to
    >> the tokenizer to work as well as possible.
    ...
    Paul> The prototype turned out to be shorter than my original post,
    ...

This doesn't quite work right.  (Nor does the similar version I posted
earlier.)  The .filter() method gets passed chunks of an HTML response, not
the entire thing.  The SpamBayesFilter class should subclass
BufferAllFilter.  Here's a tweaked version of mine which does a better job:

    import os

    from proxy3_filter import *
    import proxy3_options

    from spambayes import hammie, Options, mboxutils
    dbf = os.path.expanduser(Options.options.hammiefilter_persistent_storage_file)

    class SpambayesFilter(BufferAllFilter):
        hammie = hammie.open(dbf, 1, 'r')

        def filter(self, s):
            if self.reply.split()[1] == '200':
                prob = self.hammie.score("%s\r\n%s" % (self.serverheaders, s))
                print "|  prob: %.5f" % prob
                if prob >= Options.options.spam_cutoff:
                    print self.serverheaders
                    print "text:", s[0:40], "...", s[-40:]
            return s

    from proxy3_util import *

    register_filter('*/*', 'text/html', SpambayesFilter)

Skip





More information about the Python-list mailing list