[spambayes-dev] Re: [Spambayes] Database cleaning?

T. Alexander Popiel popiel at wolfskeep.com
Sun Jun 1 21:46:21 EDT 2003


In message:  <1054430548.31.1335 at sake.mondoinfo.com>
             Matthew Dixon Cowles <matt at mondoinfo.com> writes:
>
>I tore that code out and instead hacked the classifier so that I
>could determine how soon after a word figures in scoring that it's
>used again. I think that the results are at least slightly
>interesting. Note that the histogram below is log scaled.

[ snip of histogram showing an apparent exponential
  dropoff in usage frequency ]

Yes, this is a very interesting result.  I'm not sure it's
actually useful, but it is pretty.

Another thing that would be interesting to plot would be a histogram
of the average frequency each token gets used at... which might give
us some idea of how large a DB is actually useful.

- Alex



More information about the spambayes-dev mailing list