[Spambayes] Foreign language spam: bug or feature?
Tim Peters
tim.one@comcast.net
Fri Oct 25 17:36:08 2002
[Tim]
> ...
> Unless someone has a strong objection, I expect to introduce a new option:
>
> """
> [Tokenizer]
> # If true, replace high-bit characters (ord(c) >= 128) and
> # control characters with question marks. This allows
> # non-ASCII character strings to be identified with little
> # training and small database burden. It's appropriate only
> # if your ham is plain 7-bit ASCII, or nearly so, so that
> # the mere presence of non-ASCII character strings is known
> # in advance to be a strong spam indicator.
> replace_nonascii_chars: False
> """
This has been added, and is False by default. However, it's True by default
for users of the Outlook 2000 client, since I can't remember the last time
Mark or Sean asked me a question in Korean <wink>.