Ham or Spam? (was RE: [Spambayes] RE: Central Limit Theorem??!! :))

Tim Peters tim.one@comcast.net
Fri, 27 Sep 2002 12:23:39 -0400


FYI, another preliminary observation is that the logarithmic central-limit
scheme seems (in my data) to be very unsure (high sdevs away from both
means) about:

+ Msgs in German.
+ Msgs in French.
+ Msgs in Spanish.
+ Some msgs in Asian languages.
+ Msgs composed almost entirely of Javascript.

I'm inclined to call that a good sign, and except for the msgs in
Javascript, I rarely have a clue about whether these are ham or spam either.
WRT the Javascript, a possible weakness of our HTML stripping is that we
lose easy clues about script sections.