[Numpy-discussion] [Newbie] Fast plotting
Sebastian Stephan Berg
sebastian at sipsolutions.net
Tue Jan 6 09:45:31 EST 2009
Hello,
Just thinking. If the parameters are limited, you may be able to use the
histogram feature? Doing one histogram with Y as weights, then one
without weights and calculating the mean from this yourself should be
pretty speedy I imagine. Other then that maybe sorting the whole thing
and then doing some searchsorted and side='right' and working on those
slices maybe. I mean something like this:
def spam(x, y, work_on_copy=False):
"""Take the arrays x and y and return
unique_x_values, means, stds, maxs, mins
as lists. means, stds, maxs and mins are those
of the corresponding y values.
If work_on_copy is true, x and y are copied to ensure
that they are not sorted in place.
"""
u, means, stds, maxs, mins = [], [], [], [], []
s = x.argsort()
if work_on_copy:
x = x[s]
y = y[s]
else:
x[:] = x[s]
y[:] = y[s]
start = 0
value = x[0]
while True:
next = x.searchsorted(value, side='right')
u.append(value)
means.append(y[start:next].mean())
stds.append(y[start:next].std())
maxs.append(y[start:next].max())
mins.append(y[start:next].min())
if next == len(x):
break
value = x[next]
start = next
return u, means, stds, maxs, mins
This is of course basically the same as what Francesc suggested, but a
quick test shows that it seems to scale better. I didn't try the speed
of histogram.
Sebastian
On Tue, 2009-01-06 at 10:35 +0100, Franck Pommereau wrote:
> Hi all, and happy new year!
>
> I'm new to NumPy and searching a way to compute from a set of points
> (x,y) the mean value of y values associated to each distinct x value.
> Each point corresponds to a measure in a benchmark (x = parameter, y =
> computation time) and I'd like to plot the graph of mean computation
> time wrt parameter values. (I know how to plot, but not how to compute
> mean values.)
>
> My points are stored as two arrays X, Y (same size).
> In pure Python, I'd do as follows:
>
> s = {} # sum of y values for each distinct x (as keys)
> n = {} # number of summed values (same keys)
> for x, y in zip(X, Y) :
> s[x] = s.get(x, 0.0) + y
> n[x] = n.get(x, 0) + 1
> new_x = array(list(sorted(s)))
> new_y = array([s[x]/n[x] for x in sorted(s)])
>
> Unfortunately, this code is much too slow because my arrays have
> millions of elements. But I'm pretty sure that NumPy offers a way to
> handle this more elegantly and much faster.
>
> As a bonus, I'd be happy if the solution would allow me to compute also
> standard deviation, min, max, etc.
>
> Thanks in advance for any help!
> Franck
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list