[Spambayes] defaults vs. chi-square
T. Alexander Popiel
popiel@wolfskeep.com
Mon, 14 Oct 2002 14:36:15 -0700
In message: <LNBBLJKPBEHFEDALKOLCGEJCBLAB.tim.one@comcast.net>
Tim Peters <tim.one@comcast.net> writes:
>
>[Alex]
>> Data/Ham/Set5/2745
>> prob = 0.685540245196
>
>How did this end up getting counted as an FP? A score of 0.69 was very
>solidly in your middle ground.
You're right, I'm a twit who can't read.
Okay, where did those false positives really go?
>An odd thing is that you must have a lot of 'skip:z 70' (etc) tokens in your
>ham too, else these spamprobs wouldn't be so small. Any idea where they
>come from? It suggests the tokenizer is giving up on something it should
>really be picking apart -- but I don't have many of these in my ham, so I'm
>at a loss to guess where they come from.
I'm not sure offhand, either. I'd have to work to track it down,
though... and as mentioned earlier, today is a lazy day. My best
guess is a few base64 bits that didn't get decoded properly.
>You must have more French in your ham, then (else the French words wouldn't
>have low spamprobs).
Yes, I do, from you folks talking about French messages... this
mailing list is doing a fine job of polluting my corpora with
difficult messages. ;-)
- Alex