[Numpy-discussion] [Newbie] Fast plotting

Tue Jan 6 09:45:31 EST 2009

Hello,

Just thinking. If the parameters are limited, you may be able to use the
histogram feature? Doing one histogram with Y as weights, then one
without weights and calculating the mean from this yourself should be
pretty speedy I imagine. Other then that maybe sorting the whole thing
and then doing some searchsorted and side='right' and working on those
slices maybe. I mean something like this:

def spam(x, y, work_on_copy=False):
    """Take the arrays x and y and return
    unique_x_values, means, stds, maxs, mins
    as lists. means, stds, maxs and mins are those
    of the corresponding y values.
    If work_on_copy is true, x and y are copied to ensure
    that they are not sorted in place.
    """

    u, means, stds, maxs, mins = [], [], [], [], []

    s = x.argsort()
    if work_on_copy:
        x = x[s]
        y = y[s]    
    else:
        x[:] = x[s]
        y[:] = y[s]

    start = 0
    value = x[0]
    while True:
        next = x.searchsorted(value, side='right')
        u.append(value)
        means.append(y[start:next].mean())
        stds.append(y[start:next].std())
        maxs.append(y[start:next].max())
        mins.append(y[start:next].min())
        if next == len(x):
            break    
        value = x[next]    
        start = next

    return u, means, stds, maxs, mins

This is of course basically the same as what Francesc suggested, but a
quick test shows that it seems to scale better. I didn't try the speed
of histogram.

Sebastian

On Tue, 2009-01-06 at 10:35 +0100, Franck Pommereau wrote:
> Hi all, and happy new year!
> 
> I'm new to NumPy and searching a way to compute from a set of points
> (x,y) the mean value of y values associated to each distinct x value.
> Each point corresponds to a measure in a benchmark (x = parameter,  y =
> computation time) and I'd like to plot the graph of mean computation
> time wrt parameter values. (I know how to plot, but not how to compute
> mean values.)
> 
> My points are stored as two arrays X, Y (same size).
> In pure Python, I'd do as follows:
> 
> s = {} # sum of y values for each distinct x (as keys)
> n = {} # number of summed values (same keys)
> for x, y in zip(X, Y) :
>     s[x] = s.get(x, 0.0) + y
>     n[x] = n.get(x, 0) + 1
> new_x = array(list(sorted(s)))
> new_y = array([s[x]/n[x] for x in sorted(s)])
> 
> Unfortunately, this code is much too slow because my arrays have
> millions of elements. But I'm pretty sure that NumPy offers a way to
> handle this more elegantly and much faster.
> 
> As a bonus, I'd be happy if the solution would allow me to compute also
> standard deviation, min, max, etc.
> 
> Thanks in advance for any help!
> Franck
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>