[Spambayes] RE: For the bold
Tim Peters
tim.one@comcast.net
Sat, 05 Oct 2002 20:46:32 -0400
Oops! I misread this data badly.
> Crunching the raw data via rmspik [from the original use_central_limit]:
>
> Reading clim.pik ...
> Nham= 7500
> RmsZham= 2.93763751621
> Nspam= 7500
> RmsZspam= 3.62374621717
> ======================================================================
> HAM:
> Sure/ok 7491
> Unsure/ok 8
> Unsure/not ok 1
> Sure/not ok 0
> Unsure rate = 0.12%
> Sure fp rate = 0.00%; Unsure fp rate = 11.11%
> ======================================================================
> SPAM:
> FALSE NEGATIVE: zham=4.22 zspam=-4.08 Data/Spam/Set4/3434.txt SURE!
> FALSE NEGATIVE: zham=4.55 zspam=-3.75 Data/Spam/Set4/635.txt SURE!
> FALSE NEGATIVE: zham=4.90 zspam=-3.41 Data/Spam/Set6/12822.txt SURE!
> FALSE NEGATIVE: zham=3.18 zspam=-5.12 Data/Spam/Set7/4234.txt SURE!
> FALSE NEGATIVE: zham=4.85 zspam=-3.45 Data/Spam/Set8/975.txt SURE!
> Sure/ok 0
> Unsure/ok 0
> Unsure/not ok 7495
> Sure/not ok 5
> Unsure rate = 99.93%
> Sure fn rate = 100.00%; Unsure fn rate = 100.00%
It actually unsure about alomst 100% of the spam! So this table's first
row:
> RMS ham unsure RMS spam unsure
> -------------- ---------------
> central_limit 9 0
> central_limit2 175 77
> central_limit3 184 227
should have said
> central_limit 9 7495
instead. I assume this is evidence of a bug somewhere. Note that the hmean
and smean for a msg are always identical under the original central limit
scheme.