[spambayes-dev] Enhanced Outlook statistics display

Tim Peters tim.peters at gmail.com
Thu Dec 16 07:00:01 CET 2004


[Tony Meyer]
> Further to earlier discussion about calculating cost figures, and in
> case other people are interested and not aware of it, JGC's latest
> newsletter mentions this paper:
>
> <http://www.aueb.gr/users/ion/docs/mlnet_paper.pdf>

I'm sure we mentioned that paper here in the early days; and note that
Gary Robinson's oft-noted site has referred to it too approximately
forever, although via a different link:

    http://arxiv.org/abs/cs.CL/0006013

> And this page on the SpamAssassin wiki:
> 
> <http://wiki.apache.org/spamassassin/TotalCostRatio>
>
> I haven't had a chance to read through it in depth, but the "total
> cost ratio" appears to be more-or-less the same thing as the cost
> values that the spambayes testtools scripts produce (with the
> addition of an unsure weight).
>
> Maybe Tim knew of this and it's deliberate,

I knew the paper, and the choice to model costs in SpamBayes testing
in terms of hypothetical dollars charged to instances of different
kinds of errors was deliberate, but there's really no connection
between those.  "Dollars and cents" models are simply intuitively
appealing to people regardless of statistical background, and I didn't
want the volunteer testers on this project to feel put out by a
measure that seemed esoteric.  I also didn't give a rip about
publishing results, so didn't feel compelled to use measures with
"lambdas" or "betas" just for academic brownie points <wink>.

> but, in any case, it is interesting to see that it has been used
> elsewhere.

If you're going to provide a single figure of merit, there are
constraints pushing in this direction.  The choice of a linear model
is convenient and arguably a good first-order (literally)
approximation to a realistic cost model.

> (I wish I had found this when I was writing my CEAS paper earlier
> in the year).

Then you should have asked <wink>.


More information about the spambayes-dev mailing list