[spambayes-dev] Interesting unsure
Skip Montanaro
skip at pobox.com
Thu Jun 26 08:42:55 EDT 2003
Tim> For body (but not header) tokenization, the option
Tim> replace_nonascii_chars (off by default) is very effective against
Tim> junk like this, at least for those whose ham is mostly 7-bit ASCII.
Tim> That option replaces each "funny character" with a question mark.
Tim> So, e.g., any oddball spelling for "o" in "love" turns the token
Tim> into "l?ve";
I'd like to simply strip the accents. With the current scheme you still
wind up with four related tokens, "love", "l?ve", "lov?" and "l?v?", all
prefaced by "subject:". Since what the spammer wants you to read in all
instances is "love", I think that's the target we should aim at where
possible.
Tim> Indeed, my Unsures this week are utterly dominated by trash
Tim> bouncing back to various webmaster and admin addresses due to the
Tim> Sobig worm forging sender addresses, like
...
Mine too.
Tim> It occurs to me that I haven't had "a spam problem" since last year
Tim> -- now I've got "a virus bounce problem" <0.5 wink>!
I just classify them as spam. It's actually unclear to me why these
"anti-virus" programs feel the need to reply to such messages. Most of the
time the sender is forged anyway, so the reply goes to someone who doesn't
have the virus.
Skip
More information about the spambayes-dev
mailing list