[Spambayes] Tokenizing numbers and money

Rob Hooft rob@hooft.net
Wed Oct 16 06:03:48 2002


Tim Peters wrote:
> I believe that, but it doesn't suggest anything to me other than that a
> sixth of your tokens contain at least 3 digits in a row -- how many contain
> at least 3 letters in a row <wink>?

Roughly two thirds.

I may try to tokenize the numbers. Many numbers are not hapaxes, but 
I've seen ham significantly harmed by numbers that happened to be spam 
clues. I have customers that send me their log files full of numbers!

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/