Graham's spam filter (was Lisp to Python translation criticism?)

Erik Max Francis max at alcyone.com
Tue Aug 20 23:11:11 EDT 2002


David LeBlanc wrote:

> > signature::a
> > signature::ago
> > signature::been
> <snip>
> 
> What's the advantage of this?

Presumably he's trying to make a distinction between words that appear
in different places, which seems a reasonable approach (although trying
to divide things based on a the _signature_ is probably not going to be
very useful in spam).  I know for my own rules-based approach, it's
significant as to whether certain key words are within the (say) Subject
line or the body, and presumably this would be helpful for a statistical
filtration system as well.  It may well be that Graham's approach simply
doesn't need this level of detaill, but it certainly couldn't hurt to
think about when designing a system from scratch.

> I agree that a complete mail program should have the ability to sort
> mail
> into many categories and this phase of the operation is not where to
> do it.
> This is a pass/fail filtration step, not a sort step.

Yes, all that's being discussed here is a distinction between spam and
non-spam; any other filtering should be done by rules later on.

-- 
 Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
 __ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/  \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
    Church / http://www.alcyone.com/pyos/church/
 A lambda calculus explorer in Python.



More information about the Python-list mailing list