OT: spam filtering idea

Paul Rubin phr-n2002b at NOSPAMnightsong.com
Tue Jan 14 12:18:11 EST 2003


Skip Montanaro <skip at pobox.com> writes:
> Spambayes already looks at URLs.  Minimalist url-containing spam such as you
> mention tends to wind up "unsure" until I train on it.  Recent case in
> point, lots of spam coming from "big at boss.com".  Your message had nearly 20
> url:* tokens in it according to Spambayes tokenizer (sorted here from hammy
> to spammy):

Does spambayes look at the charset?  I get tons of spam in korean
characters.  Anything with charset="euc-kr" or "ks_c_5601-1987" etc.
is just about certainly spam.

Spambayes is already working better than spamassassin?  Wow.  I guess
I'll look into switching.  It's seemed to me up til now that it really
takes a mixture of dynamic (Bayesian) and hand-coded (SA) filtering
I've heard the next version of SA will incorporate Bayesian filtering
in addition to what it already does.




More information about the Python-list mailing list