[Spambayes] Training
Paul Moore
lists@morpheus.demon.co.uk
Tue Nov 19 23:18:30 2002
"Mark Hammond" <mhammond@skippinet.com.au> writes:
> My concern is almost identical though - the *next* email that looks the
> same. Let's say I subscribe to a weekly newsletter. This weeks comes in,
> gets marked as unsure, so I train. Next weeks comes in - again, it trains
> as unsure. Repeat ad nauseum.
Good point. That would be *really* annoying after a while.
> I saw this a real lot when I had a high ham:spam inbalance - training had no
> obvious effect.
This happened to me today, with Tim's new adjustment switched on, with
a 10:1 ham:spam imbalance. IIRC, Tim's change means that with this
sort of imbalance, ham clues will only have 10% of their normal
effect, so saying "This is ham" will be pretty much ignored :-(
> I am still hoping to try Tim's new adjustment, but I wonder if
> somehow similar maths could be exploited. For example, manually
> training a message could be seen as "intense training", wereas a
> normal train is - well - normal. The point of manual training is
> that the system got it wrong, and the user want to see the error
> stop. "normal" training is just giving the system fairly "general"
> instructions.
I'm not sure. All training is basically saying "these specific
messages *are* ham/spam". Whether this is done in bulk, or on an
individual basis, shouldn't matter. A naive view says that therefore
trained messages will score 0/100 "by definition". But the maths
doesn't work like that, and nothing is going to make it.
But I think it's a reasonable assumption that any messages which have
been explicitly trained will no longer hit the "unsure" range. I just
can't see a way of making even that assumption be true.
Paul.
--
This signature intentionally left blank
More information about the Spambayes
mailing list