[Spambayes] Re: There Can Be Only One

Anthony Baxter anthony@interlink.com.au
Wed, 25 Sep 2002 16:30:45 +1000


>>> Tim Peters wrote
> Other people on this list are mining the headers.  I can't, until I get a
> single-source corpus.  This has also been a strength for us:  I started
> without looking at *any* headers (they were chopped off the msg first
> thing), only at content, and both error rates from the original scheme got
> chopped by at least a factor of 10 each since then.  It's hard to imagine
> that adding rich clues from the headers is going to make that worse,
> although only time and testing will tell.

On my personal corpus, which is around 10k ham and 700 spam now, I'm 
actually seeing the full headers causing harm, not benefit. Analyzing
it some, it looks like the problem is that _most_ of my spam is delivered
directly to me, while most ham is via mailing lists. This means that the
occasional spam via mailing lists nearly always gets fn'd. 

More on that later - I'm currently processing the scheme-deathmatch.

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.