[Spambayes] Training

Tue Nov 19 23:18:30 2002

"Mark Hammond" <mhammond@skippinet.com.au> writes:

> My concern is almost identical though - the *next* email that looks the
> same.  Let's say I subscribe to a weekly newsletter.  This weeks comes in,
> gets marked as unsure, so I train.  Next weeks comes in - again, it trains
> as unsure.  Repeat ad nauseum.

Good point. That would be *really* annoying after a while.

> I saw this a real lot when I had a high ham:spam inbalance - training had no
> obvious effect.

This happened to me today, with Tim's new adjustment switched on, with
a 10:1 ham:spam imbalance. IIRC, Tim's change means that with this
sort of imbalance, ham clues will only have 10% of their normal
effect, so saying "This is ham" will be pretty much ignored :-(

> I am still hoping to try Tim's new adjustment, but I wonder if
> somehow similar maths could be exploited.  For example, manually
> training a message could be seen as "intense training", wereas a
> normal train is - well - normal.  The point of manual training is
> that the system got it wrong, and the user want to see the error
> stop.  "normal" training is just giving the system fairly "general"
> instructions.

I'm not sure. All training is basically saying "these specific
messages *are* ham/spam". Whether this is done in bulk, or on an
individual basis, shouldn't matter. A naive view says that therefore
trained messages will score 0/100 "by definition". But the maths
doesn't work like that, and nothing is going to make it.

But I think it's a reasonable assumption that any messages which have
been explicitly trained will no longer hit the "unsure" range. I just
can't see a way of making even that assumption be true.

Paul.
-- 
This signature intentionally left blank