[Spambayes-checkins]
spambayes Options.py,1.17,1.18 classifier.py,1.11,1.12
Tim Peters
tim_one@users.sourceforge.net
Tue, 17 Sep 2002 18:42:00 -0700
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv8672
Modified Files:
Options.py classifier.py
Log Message:
adjust_probs_by_evidence_mass is history -- the reported results weren't
strong and consistent enough to justify keeping it.
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Options.py,v
retrieving revision 1.17
retrieving revision 1.18
diff -C2 -d -r1.17 -r1.18
*** Options.py 17 Sep 2002 17:57:39 -0000 1.17
--- Options.py 18 Sep 2002 01:41:58 -0000 1.18
***************
*** 139,146 ****
max_discriminators: 16
-
- # Speculative change to allow giving probabilities more weight the more
- # messages went into computing them.
- adjust_probs_by_evidence_mass: False
"""
--- 139,142 ----
Index: classifier.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/classifier.py,v
retrieving revision 1.11
retrieving revision 1.12
diff -C2 -d -r1.11 -r1.12
*** classifier.py 15 Sep 2002 07:45:31 -0000 1.11
--- classifier.py 18 Sep 2002 01:41:58 -0000 1.12
***************
*** 547,551 ****
nham = float(self.nham or 1)
nspam = float(self.nspam or 1)
- fiddle = options.adjust_probs_by_evidence_mass
for word,record in self.wordinfo.iteritems():
# Compute prob(msg is spam | msg contains word).
--- 547,550 ----
***************
*** 560,580 ****
elif prob > MAX_SPAMPROB:
prob = MAX_SPAMPROB
-
- if fiddle:
- # Suppose two clues have spamprob 0.99. Which one is better?
- # One reasonable guess is that it's the one derived from the
- # most data. This code fiddles non-0.5 probabilities by
- # shrinking their distance to 0.5, but shrinking less the
- # more evidence went into computing them. Note that if this
- # proves to work, it should allow getting rid of the
- # "cancelling evidence" complications in spamprob()
- # (two probs exactly the same distance from 0.5 are far
- # less common after this transformation; instead, spamprob()
- # will pick up on the clues with the most evidence backing
- # them up).
- dist = prob - 0.5
- sum = hamcount + spamcount
- dist *= sum / (sum + 0.1)
- prob = 0.5 + dist
if record.spamprob != prob:
--- 559,562 ----