OT: spam filtering idea
Paul Rubin
phr-n2002b at NOSPAMnightsong.com
Tue Jan 14 12:18:11 EST 2003
Skip Montanaro <skip at pobox.com> writes:
> Spambayes already looks at URLs. Minimalist url-containing spam such as you
> mention tends to wind up "unsure" until I train on it. Recent case in
> point, lots of spam coming from "big at boss.com". Your message had nearly 20
> url:* tokens in it according to Spambayes tokenizer (sorted here from hammy
> to spammy):
Does spambayes look at the charset? I get tons of spam in korean
characters. Anything with charset="euc-kr" or "ks_c_5601-1987" etc.
is just about certainly spam.
Spambayes is already working better than spamassassin? Wow. I guess
I'll look into switching. It's seemed to me up til now that it really
takes a mixture of dynamic (Bayesian) and hand-coded (SA) filtering
I've heard the next version of SA will incorporate Bayesian filtering
in addition to what it already does.
More information about the Python-list
mailing list