[Numpy-discussion] help creating a reversed cumulative histogram
Tim Michelsen
timmichelsen at gmx-topmail.de
Thu Sep 3 09:23:06 EDT 2009
>
Hello,
I have checked the snippets you proposed.
It does what I wanted to achieve.
Obviously, I had to substract the values as Robert
demonstrated. This could also be perceived from
the figure I posted.
I still have see how I can optimise the code
(c.f. below) or modify to be less complicated.
It seemed so simple in the spreadsheet...
> eisf_sums = ecdf_sums[-1] - ecdf_sums
> # empirical inverse survival
> function of weights
Can you recommend me a (literature) source where
I can look up this term?
I learned statistics in my mother tongue and seem
to need a refresher on distributions...
I would like to come up with the right terms
next time.
> Are you sure you want cumulative weights in
>the histogram?
You mean it doesn't make sense at all?
I need:
1) the count of occurrences sorted in each bin
counts = np.histogram(values,
normed=normed,
bins=bins)
=> here I obtain now the same as in the
spreadsheet
2) the sum of all values sorted in each bin
sums = np.histogram(values, weights=values,
normed=normed,
bins=bins)
=> here I still obtain different values for the first
histogram value (eisf_sums[0]):
Numpy: eisf_sums
335.50026738, 319.21363636, 266.07724942,
198.10258741, 126.69270396, 67.98125874,
38.47335664, 24.75062937, 13.42121212,
2.48636364, 0.
Spreadsheet:
335.2351159, 319.2136364, 266.0772494,
198.1025874, 126.692704, 67.98125874,
38.47335664, 24.75062937, 13.42121212,
2.486363636, 0
Additionally, I would like to see these implemented
as convenience functions in numpy or scipy.
There should be out of the box functions for all kinds
of distributions.
Where is the best place to contrubute a final version?
The scipy.stats?
Thanks again for your input,
Timmie
##### below the distilled code #####
## histogram settings
normed = False
bins = 10
## counts: gives expected results
counts = np.histogram(values,
normed=normed,
bins=bins)
ecdf_counts = np.hstack([1.0, counts[0].cumsum() ])
ecdf_inv_counts = ecdf_counts[::-1]
# empirical inverse survival function of weights
eisf_counts = ecdf_counts[-1] - ecdf_counts
### sum: does have deviations
sums = np.histogram(values, weights=values,
normed=normed,
bins=bins)
ecdf_sums = np.hstack([1.0, sums[0].cumsum() ])
ecdf_inv_sums = ecdf_sums[::-1]
# empirical inverse survival function of weights
eisf_sums = ecdf_sums[-1] - ecdf_sums
##
# configure plot
xlabel = 'Bins'
ylabel_left = 'Counts'
ylabel_right = 'Sum'
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
# counts
ax1.plot(counts[1], ecdf_inv_counts, 'r-')
ax1.set_xlabel(xlabel)
ax1.set_ylabel(ylabel_left, color='b')
for tl in ax1.get_yticklabels():
tl.set_color('b')
# sums
ax2 = ax1.twinx()
ax2.plot(sums[1], eisf_sums, 'b-')
ax2.set_ylabel(ylabel_right, color='r')
for tl in ax2.get_yticklabels():
tl.set_color('r')
plt.show()
More information about the NumPy-Discussion
mailing list