[Spambayes-checkins] spambayes/spambayes Options.py,1.95,1.96

Tony Meyer anadelonbrin at users.sourceforge.net
Sun Dec 21 23:19:58 EST 2003


Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs1:/tmp/cvs-serv21890/spambayes

Modified Files:
	Options.py 
Log Message:
Mark options as experimental/deprecated in options docstrings.



Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v
retrieving revision 1.95
retrieving revision 1.96
diff -C2 -d -r1.95 -r1.96
*** Options.py	18 Dec 2003 06:41:52 -0000	1.95
--- Options.py	22 Dec 2003 04:19:56 -0000	1.96
***************
*** 156,172 ****
  
      ("x-search_for_habeas_headers", "Search for Habeas Headers", False,
!      """If true, search for the habeas headers (see http://www.habeas.com)
!      If they are present and correct, this should be a strong ham sign, if
!      they are present and incorrect, this should be a strong spam sign.""",
       BOOLEAN, RESTORE),
  
      ("x-reduce_habeas_headers", "Reduce Habeas Header Tokens to Single", False,
!      """If SpamBayes is set to search for the Habeas headers, nine tokens
!      are generated for messages with habeas headers.  This should be fine,
!      since messages with the headers should either be ham, or result in FN
!      so that we can send them to habeas so they can be sued.  However, to
!      reduce the strength of habeas headers, we offer the ability to reduce
!      the nine tokens to one. (This option has no effect if
!      search_for_habeas_headers is False)""",
       BOOLEAN, RESTORE),
    ),
--- 156,173 ----
  
      ("x-search_for_habeas_headers", "Search for Habeas Headers", False,
!      """(EXPERIMENTAL) If true, search for the habeas headers (see
!      http://www.habeas.com). If they are present and correct, this should
!      be a strong ham sign, if they are present and incorrect, this should
!      be a strong spam sign.""",
       BOOLEAN, RESTORE),
  
      ("x-reduce_habeas_headers", "Reduce Habeas Header Tokens to Single", False,
!      """(EXPERIMENTAL) If SpamBayes is set to search for the Habeas
!      headers, nine tokens are generated for messages with habeas headers.
!      This should be fine, since messages with the headers should either be
!      ham, or result in FN so that we can send them to habeas so they can
!      be sued.  However, to reduce the strength of habeas headers, we offer
!      the ability to reduce the nine tokens to one. (This option has no
!      effect if search_for_habeas_headers is False)""",
       BOOLEAN, RESTORE),
    ),
***************
*** 177,184 ****
    "URLRetriever" : (
      ("x-slurp_urls", "Tokenize text content at the end of URLs", False,
!      """If this option is enabled, when a message normally scores in the
!      'unsure' range, and has fewer tokens than the maximum looked at,
!      and contains URLs, then the text at those URLs is obtained and
!      tokenized.  If those tokens result in the message moving to a
       score outside the 'unsure' range, then they are added to the
       tokens for the message.  This should be particularly effective
--- 178,185 ----
    "URLRetriever" : (
      ("x-slurp_urls", "Tokenize text content at the end of URLs", False,
!      """(EXPERIMENTAL) If this option is enabled, when a message normally
!      scores in the 'unsure' range, and has fewer tokens than the maximum
!      looked at, and contains URLs, then the text at those URLs is obtained
!      and tokenized.  If those tokens result in the message moving to a
       score outside the 'unsure' range, then they are added to the
       tokens for the message.  This should be particularly effective
***************
*** 187,207 ****
      
      ("x-cache_expiry_days", "Number of days to store URLs in cache", 7,
!      """This is the number of days that local cached copies of the text
!      at the URLs will be stored for.""",
       INTEGER, RESTORE),
  
      ("x-cache_directory", "URL Cache Directory", "url-cache",
!      """So that SpamBayes doesn't need to retrieve the same URL
!      over and over again, it stores local copies of the text
!      at the end of the URL.  This is the directory that will be
!      used for those copies.""",
       PATH, RESTORE),
  
      ("x-only_slurp_base", "Retrieve base url", False,
!      """To try and speed things up, and to avoid following unique URLS, if
!      this option is enabled, SpamBayes will convert the URL to as basic a
!      form it we can.  All directory information is removed and the domain
!      is reduced to the two (or three for those with a country TLD) top-most
!      elements.  For example,
           http://www.massey.ac.nz/~tameyer/index.html?you=me
       would become
--- 188,208 ----
      
      ("x-cache_expiry_days", "Number of days to store URLs in cache", 7,
!      """(EXPERIMENTAL) This is the number of days that local cached copies
!      of the text at the URLs will be stored for.""",
       INTEGER, RESTORE),
  
      ("x-cache_directory", "URL Cache Directory", "url-cache",
!      """(EXPERIMENTAL) So that SpamBayes doesn't need to retrieve the same
!      URL over and over again, it stores local copies of the text at the
!      end of the URL.  This is the directory that will be used for those
!      copies.""",
       PATH, RESTORE),
  
      ("x-only_slurp_base", "Retrieve base url", False,
!      """(EXPERIMENTAL) To try and speed things up, and to avoid following
!      unique URLS, if this option is enabled, SpamBayes will convert the URL
!      to as basic a form it we can.  All directory information is removed
!      and the domain is reduced to the two (or three for those with a
!      country TLD) top-most elements.  For example,
           http://www.massey.ac.nz/~tameyer/index.html?you=me
       would become
***************
*** 224,231 ****
  
      ("x-web_prefix", "Prefix for tokens from web pages", "",
!      """It may be that what is hammy/spammy for you in email isn't from
!      webpages.  You can then set this option (to "web:", for example),
!      and effectively create an independent (sub)database for tokens
!      derived from parsing web pages.""",
       r"[\S]+", RESTORE),
    ),
--- 225,232 ----
  
      ("x-web_prefix", "Prefix for tokens from web pages", "",
!      """(EXPERIMENTAL) It may be that what is hammy/spammy for you in email
!      isn't from webpages.  You can then set this option (to "web:", for
!      example), and effectively create an independent (sub)database for
!      tokens derived from parsing web pages.""",
       r"[\S]+", RESTORE),
    ),
***************
*** 471,484 ****
      # LATER:  this option sucked, creating more problems than it solved.
      # It's deprecated, and the support code has gone away.
-     # XXX The "x-" prefix can't be "X-" instead, else it's considered
-     # XXX an invalid option instead of a deprecated one.  That behavior
-     # XXX doesn't match the OptionsClass comments.
  
      ("x-experimental_ham_spam_imbalance_adjustment", "Compensate for unequal numbers of spam and ham", False,
!      """If your training database has significantly more ham than
!      spam, or vice versa, you may start seeing an increase in incorrect
!      classifications (messages put in the wrong category, not just marked
!      as unsure). If so, this option allows you to compensate for this, at
!      the cost of increasing the number of messages classified as "unsure".
  
       Note that the effect is subtle, and you should experiment with both
--- 472,483 ----
      # LATER:  this option sucked, creating more problems than it solved.
      # It's deprecated, and the support code has gone away.
  
      ("x-experimental_ham_spam_imbalance_adjustment", "Compensate for unequal numbers of spam and ham", False,
!      """(DEPRECATED) If your training database has significantly more ham
!      than spam, or vice versa, you may start seeing an increase in
!      incorrect classifications (messages put in the wrong category, not
!      just marked as unsure). If so, this option allows you to compensate
!      for this, at the cost of increasing the number of messages classified
!      as "unsure".
  
       Note that the effect is subtle, and you should experiment with both
***************
*** 487,495 ****
       BOOLEAN, RESTORE),
  
!     ("x-use_bigrams", "(EXPERIMENTAL) Use mixed uni/bi-grams scheme", False,
!      """Generate both unigrams (words) and bigrams (pairs of words).
!      However, extending an idea originally from Gary Robinson, the message
!      is 'tiled' into  non-overlapping unigrams and bigrams, approximating
!      the strongest outcome over all possible tilings.
  
       Note that to really test this option you need to retrain with it on,
--- 486,494 ----
       BOOLEAN, RESTORE),
  
!     ("x-use_bigrams", "Use mixed uni/bi-grams scheme", False,
!      """(EXPERIMENTAL) Generate both unigrams (words) and bigrams (pairs of
!      words). However, extending an idea originally from Gary Robinson, the
!      message is 'tiled' into non-overlapping unigrams and bigrams,
!      approximating the strongest outcome over all possible tilings.
  
       Note that to really test this option you need to retrain with it on,





More information about the Spambayes-checkins mailing list