[Spambayes] defaults vs. chi-square

T. Alexander Popiel popiel@wolfskeep.com
Mon, 14 Oct 2002 14:36:15 -0700


In message:  <LNBBLJKPBEHFEDALKOLCGEJCBLAB.tim.one@comcast.net>
             Tim Peters <tim.one@comcast.net> writes:
>
>[Alex]
>> Data/Ham/Set5/2745
>> prob = 0.685540245196
>
>How did this end up getting counted as an FP?  A score of 0.69 was very
>solidly in your middle ground.

You're right, I'm a twit who can't read.

Okay, where did those false positives really go?

>An odd thing is that you must have a lot of 'skip:z 70' (etc) tokens in your
>ham too, else these spamprobs wouldn't be so small.  Any idea where they
>come from?  It suggests the tokenizer is giving up on something it should
>really be picking apart -- but I don't have many of these in my ham, so I'm
>at a loss to guess where they come from.

I'm not sure offhand, either.  I'd have to work to track it down,
though... and as mentioned earlier, today is a lazy day.  My best
guess is a few base64 bits that didn't get decoded properly.

>You must have more French in your ham, then (else the French words wouldn't
>have low spamprobs).

Yes, I do, from you folks talking about French messages... this
mailing list is doing a fine job of polluting my corpora with
difficult messages. ;-)

- Alex