[Spambayes] Tokenizing numbers and money
Rob Hooft
rob@hooft.net
Wed Oct 16 06:03:48 2002
Tim Peters wrote:
> I believe that, but it doesn't suggest anything to me other than that a
> sixth of your tokens contain at least 3 digits in a row -- how many contain
> at least 3 letters in a row <wink>?
Roughly two thirds.
I may try to tokenize the numbers. Many numbers are not hapaxes, but
I've seen ham significantly harmed by numbers that happened to be spam
clues. I have customers that send me their log files full of numbers!
Rob
--
Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/