[Numpy-discussion] help creating a reversed cumulative histogram

Thu Sep 3 09:23:06 EDT 2009

>
Hello,
I have checked the snippets you proposed.
It does what I wanted to achieve.
Obviously, I had to substract the values as Robert 
demonstrated. This could also be perceived from 
the figure I posted.

I still have see how I can optimise the code 
(c.f. below) or modify to be less complicated.
It seemed so simple in the spreadsheet...

> eisf_sums = ecdf_sums[-1] - ecdf_sums   
> # empirical inverse survival
> function of weights
Can you recommend me a (literature) source where 
I can look up this term?
I learned statistics in my mother tongue and seem 
to need a refresher on distributions...
I would like to come up with the right terms 
next time.

> Are you sure you want cumulative weights in 
>the histogram?
You mean it doesn't make sense at all?

I need:
1) the count of occurrences sorted in each bin
    counts = np.histogram(values, 
                                    normed=normed, 
                                    bins=bins) 
    => here I obtain now the same as in the 
    spreadsheet

2) the sum of all values sorted in each bin
    sums = np.histogram(values, weights=values, 
                                    normed=normed, 
                                    bins=bins)

    => here I still obtain different values for the first
    histogram value (eisf_sums[0]):
    Numpy: eisf_sums
    335.50026738, 319.21363636, 266.07724942,  
    198.10258741, 126.69270396, 67.98125874,   
    38.47335664,  24.75062937, 13.42121212,   
    2.48636364, 0.

    Spreadsheet:
    335.2351159, 319.2136364, 266.0772494, 
    198.1025874, 126.692704, 67.98125874, 
    38.47335664, 24.75062937, 13.42121212, 
    2.486363636, 0

Additionally, I would like to see these implemented 
as convenience functions in numpy or scipy.
There should be out of the box functions for all kinds 
of distributions.
Where is the best place to contrubute a final version?
The scipy.stats?

Thanks again for your input,
Timmie

##### below the distilled code ##### 
## histogram settings
normed = False
bins = 10 

## counts: gives expected results
counts = np.histogram(values, 
                                    normed=normed, 
                                    bins=bins) 

ecdf_counts = np.hstack([1.0, counts[0].cumsum() ])
ecdf_inv_counts = ecdf_counts[::-1]
# empirical inverse survival function of weights
eisf_counts = ecdf_counts[-1] - ecdf_counts   

### sum: does have deviations
sums = np.histogram(values, weights=values, 
                                    normed=normed, 
                                    bins=bins) 
ecdf_sums = np.hstack([1.0, sums[0].cumsum() ])
ecdf_inv_sums = ecdf_sums[::-1]
# empirical inverse survival function of weights
eisf_sums = ecdf_sums[-1] - ecdf_sums

##
# configure plot
xlabel = 'Bins'
ylabel_left = 'Counts'
ylabel_right = 'Sum'

fig1 = plt.figure()
ax1 = fig1.add_subplot(111)

# counts
ax1.plot(counts[1], ecdf_inv_counts, 'r-')
ax1.set_xlabel(xlabel)
ax1.set_ylabel(ylabel_left, color='b')
for tl in ax1.get_yticklabels():
    tl.set_color('b')

# sums
ax2 = ax1.twinx()
ax2.plot(sums[1], eisf_sums, 'b-')
ax2.set_ylabel(ylabel_right, color='r')
for tl in ax2.get_yticklabels():
    tl.set_color('r')
plt.show()