Graham's spam filter (was Lisp to Python translation criticism?)

Erik Max Francis max at alcyone.com
Sat Aug 17 19:29:00 EDT 2002


"John E. Barham" wrote:

> But I don't think that a pickled dictionary/database would be
> unmanageably
> huge, even w/ a large set of input messages, since the rate of growth
> of the
> "vocabulary" (i.e., set of tokens) would slow as more messages were
> input. The spam probability database in particular is smaller than the
> "good" and
> "bad" ones since it has a frequency threshold.

That's true, but if your spam filter acts as a standalone program (i.e.,
one that is simply invoked from your .qmail and/or .forward file), it's
going to have to read that probability database each time an email comes
in.  Updating the database is much more intensive, but can happen much
less often.

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/  \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
    Church / http://www.alcyone.com/pyos/church/
 A lambda calculus explorer in Python.



More information about the Python-list mailing list