[Spambayes] RE: Central Limit Theorem??!! :)

Tim Peters tim.one@comcast.net
Mon, 23 Sep 2002 02:20:47 -0400


[Gary Robinson]
> ...
> Now let's generate a population of f(w)'s as follows.
>
> We go through all the spam emails one by one, adding the 30 most extreme
> f(w)'s to our population (only making one addition for each unique
> occurrence of a word in the population).

I'm unclear on the intent of the parenthetical comment.  Suppose I have 1
spam with 500 occurrences of Nigera, and 99 spam with 1 occurrence of
Nigeria each, and no other spam contains Nigeria.  Is f('Nigeria') to be
added in 1 time total, or 100 times total?  The code I have does the latter.