[Spambayes] full o' spaces

Sat Mar 8 21:56:07 EST 2003

[Tim Stone]
> Ok, so let me summarize what I think our discussion has boiled down to.
>
> 1. We will not make changes that regress our results on existing spam.

There are two error rates, and an unsure rate, and they're all important.
I'm afraid that when someone sees a spam and suggests a gimmick to nail it,
they forget that it's also going to penalize some ham, and affect the unsure
rate too.  It's just human nature to fixate on potential benefits and
discount potential costs.  The point of statistical testing is to look at
all the effects.  A change that's a pure win on all counts has become
exceedingly hard to come up with.

> 2. We will engage in ongoing analysis of spam, keeping our
> testing corpora up to date as best we can.  When significant (we have yet
to
> define significant) amounts of FN start happening, we will adjust the
> tokenizer appropriately.

Or bad trends in FP or Unsure, and provided someone can dream up a gimmick
that addresses the problem du jour without damaging the things they're *not*
thinking about more than helping the thing they are thinking about.

> Point 1 is a given.  There seems to be considerable inertia in
> the project toward using point 2 as an ongoing strategy.

I watch my spam, ham and unsures closely, and check in a change whenever
there's an identifiable screwup.  For example, that's how the treatment of
embedded nonsense HTML tags got repaired a while ago, and very recently is
how unclosed HTML start-comment tags stopped being a problem.

I'm not seeing any loss of effectiveness in my own email, though, and it's
true I don't spend any time dreaming up ways to defeat the system.  So long
as spam uses the language and artifacts of advertising, and the tokenizer
sees those, it will be damned hard to get spam thru reliably -- and it will
be hard to get solicited commercial email thru too (it's still the case that
the first time or two I get a desired email from a given online business, it
rates Unsure or even as Spam -- it depends on how obnoxious it is).

Exceptions raised by the email pkg now appear to be the easiest approach to
hiding msg content from this particular system, and if I were a spammer
that's what I'd concentrate on.  Python allows very easy ways to catch
exceptions, though, so it's not something I'm frightened of -- we've added
alternative processing paths for email exceptions before, and we can add
more.  There's a systematic spambayes codebase problem, though, in that
people call the email pkg parsing functions directly, and that prevents
centralizing workarounds for pkg weaknesses that get discovered.

> I can live with it, because there's tremendous value in what we're doing,
> and it really does work.  I just have to say, though, that from a
marketing
> viewpoint (believe it or not, I was a marketer in a former life), this
> strategy can potentially shoot us in the foot, because we aren't the ones
> finding problems, spammers are,

I've seen no evidence that they're finding anything to exploit here, and
doubt this particular project is popular enough for them to target.  Most
spam damaged enough to make the email pkg complain appears to me to be due
to spammer incompetence, or to bugs in the software they're using to
generate the spam.  If you want to see something break, give it to a 2-year
old <0.9 wink>.  At the moment, I have a grand total of one spam from my
personal email that still breaks the system (causes an email BoundaryError
exception that the Outlook client doesn't protect itself against), and
that's it, out of tens of thousands.  I got that email last December, and
haven't gotten another like; I conclude it's evidence of a spammer who
didn't know what they were doing.

I confess I haven't fixed this bug, since it turned out to be a one-shot
thing and there are so many other things demanding my time.  Fixing a bug I
don't expect to see again just doesn't rate high enough to get done.

> and I think this could cause our users to lose faith in our product.  "I
> trained this stuff as spam, and this thing STILL doesn't catch it."

That irritation can occur even when the system is working perfectly, alas.
The flip side is that the lack of special cases to *force* classification as
one thing or another also makes it impossible to attack such a subsystem:
"preponderance of evidence" is the only way to get a score out of the
system.

> If that happens to a user more than a few times, the conclusion will be
> that it doesn't work.  I'm telling you, it doesn't take but one bad
article
> in a ZD publication, and it's all over with for us.

OTOH, one good article in a ZD publication would kill us with newbie support
requests too <0.5 wink>.