[Spambayes] spamBayes is great, thank you all!

Tim Peters tim.one at comcast.net
Sun Oct 5 22:33:03 EDT 2003


[Tim]
>> If your primary language isn't English, that could explain it, as
>> *most* developers and testers here use English. If, for example,
>> your primary email language is German, then the Outlook addin's
>>
>>     [Tokenizer]
>>     replace_nonascii_chars: True
>>
>> setting may be inappropriate for you

[AndreasK]
> OK, I have changed it to:
> replace_nonascii_chars: False

> correct?

Yup!

> Do I have to RETRAIN?

For best results, yes.  The option affects the tokens that get stored into
your database.  For example, if Rückhalt appeared in your trained email when
the option was True, r?ckhalt got stored in your database, but after you
change the option to False, rückhalt will get looked up in new email.  That
won't match the r?ckhalt stored before.

> Should I discard my "old" spambayes database before?

For best results, yes, and for the same reason.

>> and the default skip_max_word_size value of 12 may be too small
>> (13-character words like Unterstützung are hurt by both of those:
>> first the ü gets replaced by a question mark due to
>> replace_nonascii_chars, and then the whole word gets replaced by a
>> synthesized "skip: U 10" token because 13 > 12).

> Do you really think I should change that, too?

If and only if you want to experiment.  We didn't experiment with German
here -- nobody has, as far as I know.

> Any disadvantages?

Your database will grow larger, and it *may* work worse for you.  It's
impossible to guess without trying it.

You don't *have* to try it!  We're happy to have you here even if you "just
use" the program <wink>.

> WHERE, in which file (there are 2 Options.py and 2 tokenizer.py
> files containing skip_max_word_size)
> Why is there no .INI file for the main spambayes program?

If you're using the Outlook client, there's a file named

    default_bayes_customize.ini

That's the same .ini file you changed when you set replace_nonascii_chars to
False.




More information about the Spambayes mailing list