[Spambayes-checkins] spambayes/spambayes Options.py,1.95,1.96
Tony Meyer
anadelonbrin at users.sourceforge.net
Sun Dec 21 23:19:58 EST 2003
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv21890/spambayes
Modified Files:
Options.py
Log Message:
Mark options as experimental/deprecated in options docstrings.
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.95
retrieving revision 1.96
diff -C2 -d -r1.95 -r1.96
*** Options.py 18 Dec 2003 06:41:52 -0000 1.95
--- Options.py 22 Dec 2003 04:19:56 -0000 1.96
***************
*** 156,172 ****
("x-search_for_habeas_headers", "Search for Habeas Headers", False,
! """If true, search for the habeas headers (see http://www.habeas.com)
! If they are present and correct, this should be a strong ham sign, if
! they are present and incorrect, this should be a strong spam sign.""",
BOOLEAN, RESTORE),
("x-reduce_habeas_headers", "Reduce Habeas Header Tokens to Single", False,
! """If SpamBayes is set to search for the Habeas headers, nine tokens
! are generated for messages with habeas headers. This should be fine,
! since messages with the headers should either be ham, or result in FN
! so that we can send them to habeas so they can be sued. However, to
! reduce the strength of habeas headers, we offer the ability to reduce
! the nine tokens to one. (This option has no effect if
! search_for_habeas_headers is False)""",
BOOLEAN, RESTORE),
),
--- 156,173 ----
("x-search_for_habeas_headers", "Search for Habeas Headers", False,
! """(EXPERIMENTAL) If true, search for the habeas headers (see
! http://www.habeas.com). If they are present and correct, this should
! be a strong ham sign, if they are present and incorrect, this should
! be a strong spam sign.""",
BOOLEAN, RESTORE),
("x-reduce_habeas_headers", "Reduce Habeas Header Tokens to Single", False,
! """(EXPERIMENTAL) If SpamBayes is set to search for the Habeas
! headers, nine tokens are generated for messages with habeas headers.
! This should be fine, since messages with the headers should either be
! ham, or result in FN so that we can send them to habeas so they can
! be sued. However, to reduce the strength of habeas headers, we offer
! the ability to reduce the nine tokens to one. (This option has no
! effect if search_for_habeas_headers is False)""",
BOOLEAN, RESTORE),
),
***************
*** 177,184 ****
"URLRetriever" : (
("x-slurp_urls", "Tokenize text content at the end of URLs", False,
! """If this option is enabled, when a message normally scores in the
! 'unsure' range, and has fewer tokens than the maximum looked at,
! and contains URLs, then the text at those URLs is obtained and
! tokenized. If those tokens result in the message moving to a
score outside the 'unsure' range, then they are added to the
tokens for the message. This should be particularly effective
--- 178,185 ----
"URLRetriever" : (
("x-slurp_urls", "Tokenize text content at the end of URLs", False,
! """(EXPERIMENTAL) If this option is enabled, when a message normally
! scores in the 'unsure' range, and has fewer tokens than the maximum
! looked at, and contains URLs, then the text at those URLs is obtained
! and tokenized. If those tokens result in the message moving to a
score outside the 'unsure' range, then they are added to the
tokens for the message. This should be particularly effective
***************
*** 187,207 ****
("x-cache_expiry_days", "Number of days to store URLs in cache", 7,
! """This is the number of days that local cached copies of the text
! at the URLs will be stored for.""",
INTEGER, RESTORE),
("x-cache_directory", "URL Cache Directory", "url-cache",
! """So that SpamBayes doesn't need to retrieve the same URL
! over and over again, it stores local copies of the text
! at the end of the URL. This is the directory that will be
! used for those copies.""",
PATH, RESTORE),
("x-only_slurp_base", "Retrieve base url", False,
! """To try and speed things up, and to avoid following unique URLS, if
! this option is enabled, SpamBayes will convert the URL to as basic a
! form it we can. All directory information is removed and the domain
! is reduced to the two (or three for those with a country TLD) top-most
! elements. For example,
http://www.massey.ac.nz/~tameyer/index.html?you=me
would become
--- 188,208 ----
("x-cache_expiry_days", "Number of days to store URLs in cache", 7,
! """(EXPERIMENTAL) This is the number of days that local cached copies
! of the text at the URLs will be stored for.""",
INTEGER, RESTORE),
("x-cache_directory", "URL Cache Directory", "url-cache",
! """(EXPERIMENTAL) So that SpamBayes doesn't need to retrieve the same
! URL over and over again, it stores local copies of the text at the
! end of the URL. This is the directory that will be used for those
! copies.""",
PATH, RESTORE),
("x-only_slurp_base", "Retrieve base url", False,
! """(EXPERIMENTAL) To try and speed things up, and to avoid following
! unique URLS, if this option is enabled, SpamBayes will convert the URL
! to as basic a form it we can. All directory information is removed
! and the domain is reduced to the two (or three for those with a
! country TLD) top-most elements. For example,
http://www.massey.ac.nz/~tameyer/index.html?you=me
would become
***************
*** 224,231 ****
("x-web_prefix", "Prefix for tokens from web pages", "",
! """It may be that what is hammy/spammy for you in email isn't from
! webpages. You can then set this option (to "web:", for example),
! and effectively create an independent (sub)database for tokens
! derived from parsing web pages.""",
r"[\S]+", RESTORE),
),
--- 225,232 ----
("x-web_prefix", "Prefix for tokens from web pages", "",
! """(EXPERIMENTAL) It may be that what is hammy/spammy for you in email
! isn't from webpages. You can then set this option (to "web:", for
! example), and effectively create an independent (sub)database for
! tokens derived from parsing web pages.""",
r"[\S]+", RESTORE),
),
***************
*** 471,484 ****
# LATER: this option sucked, creating more problems than it solved.
# It's deprecated, and the support code has gone away.
- # XXX The "x-" prefix can't be "X-" instead, else it's considered
- # XXX an invalid option instead of a deprecated one. That behavior
- # XXX doesn't match the OptionsClass comments.
("x-experimental_ham_spam_imbalance_adjustment", "Compensate for unequal numbers of spam and ham", False,
! """If your training database has significantly more ham than
! spam, or vice versa, you may start seeing an increase in incorrect
! classifications (messages put in the wrong category, not just marked
! as unsure). If so, this option allows you to compensate for this, at
! the cost of increasing the number of messages classified as "unsure".
Note that the effect is subtle, and you should experiment with both
--- 472,483 ----
# LATER: this option sucked, creating more problems than it solved.
# It's deprecated, and the support code has gone away.
("x-experimental_ham_spam_imbalance_adjustment", "Compensate for unequal numbers of spam and ham", False,
! """(DEPRECATED) If your training database has significantly more ham
! than spam, or vice versa, you may start seeing an increase in
! incorrect classifications (messages put in the wrong category, not
! just marked as unsure). If so, this option allows you to compensate
! for this, at the cost of increasing the number of messages classified
! as "unsure".
Note that the effect is subtle, and you should experiment with both
***************
*** 487,495 ****
BOOLEAN, RESTORE),
! ("x-use_bigrams", "(EXPERIMENTAL) Use mixed uni/bi-grams scheme", False,
! """Generate both unigrams (words) and bigrams (pairs of words).
! However, extending an idea originally from Gary Robinson, the message
! is 'tiled' into non-overlapping unigrams and bigrams, approximating
! the strongest outcome over all possible tilings.
Note that to really test this option you need to retrain with it on,
--- 486,494 ----
BOOLEAN, RESTORE),
! ("x-use_bigrams", "Use mixed uni/bi-grams scheme", False,
! """(EXPERIMENTAL) Generate both unigrams (words) and bigrams (pairs of
! words). However, extending an idea originally from Gary Robinson, the
! message is 'tiled' into non-overlapping unigrams and bigrams,
! approximating the strongest outcome over all possible tilings.
Note that to really test this option you need to retrain with it on,
More information about the Spambayes-checkins
mailing list