[Spambayes-checkins] spambayes/spambayes Options.py,1.110,1.111
Tony Meyer
anadelonbrin at users.sourceforge.net
Thu Aug 5 02:56:06 CEST 2004
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv17846/spambayes
Modified Files:
Options.py
Log Message:
Goodbye to the last rements of the experimental imbalance option. Having this
in your configuration file would do nothing these days anyway.
Also goodbye to two deprecated options: [Tokenizer] x-extract_dow and
[Tokenizer] x-generate_time_buckets. No-one objected on the list (and some
agreed), and they've been deprecated for a while. 1.0 (1.0.1, etc) users will
continue to get a "you are using a deprecated option" warning in their logs,
so by the time they move to 1.1, they should have stopped.
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.110
retrieving revision 1.111
diff -C2 -d -r1.110 -r1.111
*** Options.py 21 Jul 2004 18:58:51 -0000 1.110
--- Options.py 5 Aug 2004 00:55:54 -0000 1.111
***************
*** 137,149 ****
INTEGER, RESTORE),
- ("x-generate_time_buckets", "Generate time buckets", False,
- """(DEPRECATED) Generate tokens which resemble the posting time
- in 10-minute buckets: 'time:' hour ':' minute//10""",
- BOOLEAN, RESTORE),
-
- ("x-extract_dow", "Extract day-of-week", False,
- """(DEPRECATED) Extract day of the week tokens from the Date: header.""",
- BOOLEAN, RESTORE),
-
("x-pick_apart_urls", "Extract clues about url structure", False,
"""(EXPERIMENTAL) Note whether url contains non-standard port or
--- 137,140 ----
***************
*** 468,499 ****
BOOLEAN, RESTORE),
- # If the # of ham and spam in training data are out of balance, the
- # spamprob guesses can get stronger in the direction of the category
- # with more training msgs. In one sense this must be so, since the more
- # data we have of one flavor, the more we know about that flavor. But
- # that allows the accidental appearance of a strong word of that flavor
- # in a msg of the other flavor much more power than an accident in the
- # other direction. Enable experimental_ham_spam_imbalance_adjustment if
- # you have more ham than spam training data (or more spam than ham), and
- # the Bayesian probability adjustment won't 'believe' raw counts more
- # than min(# ham trained on, # spam trained on) justifies. I *expect*
- # this option will go away (and become the default), but people *with*
- # strong imbalance need to test it first.\
- # LATER: this option sucked, creating more problems than it solved.
- # It's deprecated, and the support code has gone away.
-
- ("x-experimental_ham_spam_imbalance_adjustment", "Compensate for unequal numbers of spam and ham", False,
- """(DEPRECATED) If your training database has significantly more ham
- than spam, or vice versa, you may start seeing an increase in
- incorrect classifications (messages put in the wrong category, not
- just marked as unsure). If so, this option allows you to compensate
- for this, at the cost of increasing the number of messages classified
- as "unsure".
-
- Note that the effect is subtle, and you should experiment with both
- settings to choose the option that suits you best. You do not have
- to retrain your database if you change this option.""",
- BOOLEAN, RESTORE),
-
("x-use_bigrams", "Use mixed uni/bi-grams scheme", False,
"""(EXPERIMENTAL) Generate both unigrams (words) and bigrams (pairs of
--- 459,462 ----
More information about the Spambayes-checkins
mailing list