[Spambayes-checkins] spambayes Histogram.py,1.5,1.6 Options.py,1.50,1.51

Tim Peters tim_one@users.sourceforge.net
Thu, 17 Oct 2002 23:58:57 -0700


Update of /cvsroot/spambayes/spambayes
In directory usw-pr-cvs1:/tmp/cvs-serv14705

Modified Files:
	Histogram.py Options.py 
Log Message:
Patch inspired by Rob Hooft:  new option "percentiles", giving a list
of percentile points to compute and display with histograms.


Index: Histogram.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Histogram.py,v
retrieving revision 1.5
retrieving revision 1.6
diff -C2 -d -r1.5 -r1.6
*** Histogram.py	8 Oct 2002 18:13:49 -0000	1.5
--- Histogram.py	18 Oct 2002 06:58:55 -0000	1.6
***************
*** 30,33 ****
--- 30,34 ----
      #     median    midpoint
      #     mean
+     #     pct       list of (percentile, score) pairs
      #     var       variance
      #     sdev      population standard deviation (sqrt(variance))
***************
*** 66,69 ****
--- 67,88 ----
          self.var = var / n
          self.sdev = math.sqrt(self.var)
+         # Compute percentiles.
+         self.pct = pct = []
+         for p in options.percentiles:
+             assert 0.0 <= p <= 100.0
+             # In going from data index 0 to index n-1, we move n-1 times.
+             # p% of that is (n-1)*p/100.
+             i = (n-1)*p/1e2
+             if i < 0:
+                 # Just return the smallest.
+                 score = data[0]
+             else:
+                 whole = int(i)
+                 frac = i - whole
+                 score = data[whole]
+                 if whole < n-1 and frac:
+                     # Move frac of the way from this score to the next.
+                     score += frac * (data[whole + 1] - score)
+             pct.append((p, score))
  
      # Merge other into self.
***************
*** 125,128 ****
--- 144,150 ----
                                                         self.median,
                                                         self.max)
+         pcts = ['%g%% %g' % x for x in self.pct]
+         print "-> <stat> percentiles:", '; '.join(pcts)
+ 
          lo, hi = self.get_lo_hi()
          if lo > hi:

Index: Options.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Options.py,v
retrieving revision 1.50
retrieving revision 1.51
diff -C2 -d -r1.50 -r1.51
*** Options.py	18 Oct 2002 05:44:04 -0000	1.50
--- Options.py	18 Oct 2002 06:58:55 -0000	1.51
***************
*** 148,151 ****
--- 148,157 ----
  best_cutoff_unsure_weight:  0.20
  
+ # Histogram analysis also displays percentiles.  For each percentile p
+ # in the list, the score S such that p% of all scores are <= S is given.
+ # Note that percentile 50 is the median, and is displayed (along with the
+ # min score and max score) independent of this option.
+ percentiles: 5 25 75 95
+ 
  # Display spam when
  #     show_spam_lo <= spamprob <= show_spam_hi
***************
*** 288,291 ****
--- 294,298 ----
                     'show_unsure': boolean_cracker,
                     'show_histograms': boolean_cracker,
+                    'percentiles': ('get', lambda s: map(float, s.split())),
                     'show_best_discriminators': int_cracker,
                     'save_trained_pickles': boolean_cracker,