[Spambayes-checkins] spambayes Options.py,1.17,1.18 classifier.py,1.11,1.12

Tue, 17 Sep 2002 18:42:00 -0700

Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv8672

Modified Files:
	Options.py classifier.py 
Log Message:
adjust_probs_by_evidence_mass is history -- the reported results weren't
strong and consistent enough to justify keeping it.


Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Options.py,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -d -r1.17 -r1.18
*** Options.py	17 Sep 2002 17:57:39 -0000	1.17
--- Options.py	18 Sep 2002 01:41:58 -0000	1.18
***************
*** 139,146 ****
  
  max_discriminators: 16
- 
- # Speculative change to allow giving probabilities more weight the more
- # messages went into computing them.
- adjust_probs_by_evidence_mass: False
  """
  
--- 139,142 ----

Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/classifier.py,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** classifier.py	15 Sep 2002 07:45:31 -0000	1.11
--- classifier.py	18 Sep 2002 01:41:58 -0000	1.12
***************
*** 547,551 ****
          nham = float(self.nham or 1)
          nspam = float(self.nspam or 1)
-         fiddle = options.adjust_probs_by_evidence_mass
          for word,record in self.wordinfo.iteritems():
              # Compute prob(msg is spam | msg contains word).
--- 547,550 ----
***************
*** 560,580 ****
              elif prob > MAX_SPAMPROB:
                  prob = MAX_SPAMPROB
- 
-             if fiddle:
-                 # Suppose two clues have spamprob 0.99.  Which one is better?
-                 # One reasonable guess is that it's the one derived from the
-                 # most data.  This code fiddles non-0.5 probabilities by
-                 # shrinking their distance to 0.5, but shrinking less the
-                 # more evidence went into computing them.  Note that if this
-                 # proves to work, it should allow getting rid of the
-                 # "cancelling evidence" complications in spamprob()
-                 # (two probs exactly the same distance from 0.5 are far
-                 # less common after this transformation; instead, spamprob()
-                 # will pick up on the clues with the most evidence backing
-                 # them up).
-                 dist = prob - 0.5
-                 sum = hamcount + spamcount
-                 dist *= sum / (sum + 0.1)
-                 prob = 0.5 + dist
  
              if record.spamprob != prob:
--- 559,562 ----