[Spambayes-checkins] spambayes Histogram.py,1.5,1.6
Options.py,1.50,1.51
Tim Peters
tim_one@users.sourceforge.net
Thu, 17 Oct 2002 23:58:57 -0700
- Previous message: [Spambayes-checkins] spambayes Options.py,1.49,1.50
README.txt,1.37,1.38
TestDriver.py,1.25,1.26 classifier.py,1.38,1.39 clgen.py,1.1,NONE
clpik.py,1.1,NONE rmspik.py,1.4,NONE
- Next message: [Spambayes-checkins]
spambayes Options.py,1.51,1.52 classifier.py,1.39,1.40
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv14705
Modified Files:
Histogram.py Options.py
Log Message:
Patch inspired by Rob Hooft: new option "percentiles", giving a list
of percentile points to compute and display with histograms.
Index: Histogram.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Histogram.py,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** Histogram.py 8 Oct 2002 18:13:49 -0000 1.5
--- Histogram.py 18 Oct 2002 06:58:55 -0000 1.6
***************
*** 30,33 ****
--- 30,34 ----
# median midpoint
# mean
+ # pct list of (percentile, score) pairs
# var variance
# sdev population standard deviation (sqrt(variance))
***************
*** 66,69 ****
--- 67,88 ----
self.var = var / n
self.sdev = math.sqrt(self.var)
+ # Compute percentiles.
+ self.pct = pct = []
+ for p in options.percentiles:
+ assert 0.0 <= p <= 100.0
+ # In going from data index 0 to index n-1, we move n-1 times.
+ # p% of that is (n-1)*p/100.
+ i = (n-1)*p/1e2
+ if i < 0:
+ # Just return the smallest.
+ score = data[0]
+ else:
+ whole = int(i)
+ frac = i - whole
+ score = data[whole]
+ if whole < n-1 and frac:
+ # Move frac of the way from this score to the next.
+ score += frac * (data[whole + 1] - score)
+ pct.append((p, score))
# Merge other into self.
***************
*** 125,128 ****
--- 144,150 ----
self.median,
self.max)
+ pcts = ['%g%% %g' % x for x in self.pct]
+ print "-> <stat> percentiles:", '; '.join(pcts)
+
lo, hi = self.get_lo_hi()
if lo > hi:
Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Options.py,v
retrieving revision 1.50
retrieving revision 1.51
diff -C2 -d -r1.50 -r1.51
*** Options.py 18 Oct 2002 05:44:04 -0000 1.50
--- Options.py 18 Oct 2002 06:58:55 -0000 1.51
***************
*** 148,151 ****
--- 148,157 ----
best_cutoff_unsure_weight: 0.20
+ # Histogram analysis also displays percentiles. For each percentile p
+ # in the list, the score S such that p% of all scores are <= S is given.
+ # Note that percentile 50 is the median, and is displayed (along with the
+ # min score and max score) independent of this option.
+ percentiles: 5 25 75 95
+
# Display spam when
# show_spam_lo <= spamprob <= show_spam_hi
***************
*** 288,291 ****
--- 294,298 ----
'show_unsure': boolean_cracker,
'show_histograms': boolean_cracker,
+ 'percentiles': ('get', lambda s: map(float, s.split())),
'show_best_discriminators': int_cracker,
'save_trained_pickles': boolean_cracker,
- Previous message: [Spambayes-checkins] spambayes Options.py,1.49,1.50
README.txt,1.37,1.38
TestDriver.py,1.25,1.26 classifier.py,1.38,1.39 clgen.py,1.1,NONE
clpik.py,1.1,NONE rmspik.py,1.4,NONE
- Next message: [Spambayes-checkins]
spambayes Options.py,1.51,1.52 classifier.py,1.39,1.40
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]