[Spambayes] Latest spammer trick stymied

Tue Apr 1 21:02:54 EST 2003

> From: Tim Stone - Four Stones Expressions
> > That's right.  We really should try to solve this problem
> > with tokenization.
> 
> Silly question, but is there actually a problem? The system isn't
> expected to be 100% perfect. Is this happening often enough to justify
> the effort?

That's a very good question, actually. IMHO, it's happening often 
enough when your inbox is normally 99.9% spam-free, but suddenly, a 
few of these low-mass particles start sneaking through...

> I get a reasonable number of virus mails from big at boss.com, they
> generally come in as "unsure". After I train on 5 or 6 of them,
> they start coming in as spam. No problem. Won't this work here
> as well?

Apparently not. My proxy catches viruses too, real well! This is a 
bit different, in that these subatomics are sent from randomly 
generated sub-domains, with randomized senders, etc. Thus, minimal 
and rapidly-changing clue sets. There's just no good way to train on 
them quickly enough. It's damn annoying, is what...

> If the issue is with the person who was surprised that Spambayes
> didn't identify an "obvious" spam, maybe it's just an education
> issue.

Nope, the tester in question is a very educated consumer. I can see 
where you're going, but the general public expects a so-called 
"filtering proxy service" to work 100% of the time. And they're 
perplexed when it misses something they think is obvious.

But let's not worry about a URL slurper getting into the core 
SpamBayes code. It probably shouldn't. But certain individuals might 
want to experiment with the notion, and that's the kind of real-world 
testing that can only improve an already extraordinarily intelligent 
mail filter. Which is a Good Thing, I reckon...  :-)

Cheers,
Richard