[Spambayes] RE: Central Limit Theorem??!! :)
Tim Peters
tim.one@comcast.net
Mon, 23 Sep 2002 02:20:47 -0400
[Gary Robinson]
> ...
> Now let's generate a population of f(w)'s as follows.
>
> We go through all the spam emails one by one, adding the 30 most extreme
> f(w)'s to our population (only making one addition for each unique
> occurrence of a word in the population).
I'm unclear on the intent of the parenthetical comment. Suppose I have 1
spam with 500 occurrences of Nigera, and 99 spam with 1 occurrence of
Nigeria each, and no other spam contains Nigeria. Is f('Nigeria') to be
added in 1 time total, or 100 times total? The code I have does the latter.