[Spambayes] Re: For the bold
Rob Hooft
rob@hooft.net
Sat, 05 Oct 2002 17:26:34 +0200
This is a multi-part message in MIME format.
---------------------- multipart/mixed attachment
Another large message.
Appended is a pdf containing six histograms made using
max_discriminators=55
The first one is zham for all ham messages. As you can see, the
distribution is asymmetric. Furthermore, a simple average and standard
deviation calculation results in a bell curve that does not follow the
important tail of the histogram: the chances will be severely
underestimated by these parameters.
The second one is abs(zham) for all ham messages. The bell curve fits
this histogram much better!
The third page is zspam for all spam messages.
The fourth page is abs(zspam) for all spam messages. Also much better.
Fifth and sixth are zspam for all ham and zham for all spam, just to
complete the picture.
From the second and fourth image, I drew the conclusion that my
Z-scores are overestimated by a factor of 6.7/6.6. This means e.g. that
the zspam for all ham distribution is not -53 +/- 20, but -8 +/- 3 and
the zham for all spam distribution is not -43 +/- 18, but -6.4 +/- 2.6
I will try a discriminator based on this.
Rob
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/
---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: all.pdf
Type: application/pdf
Size: 56510 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021005/df86955c/all.pdf
---------------------- multipart/mixed attachment--