[Spambayes] Tokens

Tim Peters tim.one at comcast.net
Sat Aug 16 12:59:50 EDT 2003


[Steve Davis]
> Spambayes is most certainly converging on 100%, just like in the ads.
> ;o)
>
> But I want to know more!  I have done my best to learn about message
> tokens.  I have RTFFAQ, I have browsed the troubleshooting guide, I
> have perused the home page all to no avail.
>
> Can someone please explain message tokens?  Technical answer is fine.

If you're not used to open source projects, you may not have noticed that we
give you all the source code <wink> -- everything there is to know about
spambayes tokens is in tokenizer.py.  Summarizing what's there is
necessarily inaccurate, because there's no "overriding principle":  a huge
variety of tokenization gimmicks have been tried, and the ones that survived
are the ones that testing said worked best.  So the mix of gimmicks we
currently use isn't driven by logic, it's driven by test results.  As a
result, it's quite a hodge-podge (for example, we preserve case in Subject
lines, but ignore case almost everywhere else, and there's no explanation
for that beyond archived test results).




More information about the Spambayes mailing list