[Spambayes] RE: Central Limit Theorem??!! :)

T. Alexander Popiel popiel@wolfskeep.com
Mon, 23 Sep 2002 10:35:46 -0700


In message:  <LNBBLJKPBEHFEDALKOLCKECGBGAB.tim.one@comcast.net>
             Tim Peters <tim.one@comcast.net> writes:
>[T. Alexander Popiel]
>> This can still be done in one pass.  While calculating mean and stddev,
>> keep a running list of the 30 highest and 30 lowest f(w), then at the end
>> figure out which of those 60 are most extreme.  (Assuming, of course,
>> that I'm properly interpreting what is meant by 'extreme'.)
>
>OK, it's really a three-pass process.
>
>You don't know what any of the f(w) are until the first pass is complete.
>Computing f(w) depends on first having seen every message that contains w.
>
>Computing f(w) is then a second pass, but doesn't require the source of
>messages, it just needs the accumulated counts from the first pass.
>
>The third pass is needed to find the 30 extremes for *each* training message
>(as Gary said).  This again requires knowing the set of words contained in
>each message.

Right.  Okay, I'll go hide my head in the sand, now...

- Alex