[Numpy-discussion] [Newbie] Fast plotting

Francesc Alted faltet at pytables.org
Tue Jan 6 06:56:44 EST 2009


A Tuesday 06 January 2009, Franck Pommereau escrigué:
> Hi all, and happy new year!
>
> I'm new to NumPy and searching a way to compute from a set of points
> (x,y) the mean value of y values associated to each distinct x value.
> Each point corresponds to a measure in a benchmark (x = parameter,  y
> = computation time) and I'd like to plot the graph of mean
> computation time wrt parameter values. (I know how to plot, but not
> how to compute mean values.)
>
> My points are stored as two arrays X, Y (same size).
> In pure Python, I'd do as follows:
>
> s = {} # sum of y values for each distinct x (as keys)
> n = {} # number of summed values (same keys)
> for x, y in zip(X, Y) :
>     s[x] = s.get(x, 0.0) + y
>     n[x] = n.get(x, 0) + 1
> new_x = array(list(sorted(s)))
> new_y = array([s[x]/n[x] for x in sorted(s)])
>
> Unfortunately, this code is much too slow because my arrays have
> millions of elements. But I'm pretty sure that NumPy offers a way to
> handle this more elegantly and much faster.
>
> As a bonus, I'd be happy if the solution would allow me to compute
> also standard deviation, min, max, etc.

The next would do the trick:

In [92]: x = np.random.randint(100,size=100)

In [93]: y = np.random.rand(100)

In [94]: u = np.unique(x)

In [95]: means = [ y[x == i].mean() for i in u ]

In [96]: stds = [ y[x == i].std() for i in u ]

In [97]: maxs = [ y[x == i].max() for i in u ]

In [98]: mins = [ y[x == i].min() for i in u ]

and your wanted data will be in means, stds, maxs and mins lists.  This 
approach has the drawback that you have to process the array each time 
that you want to extract the desired info.  If what you want is to 
always retrieve the same set of statistics, you can do this in one 
single loop:

In [99]: means, std, maxs, mins = [], [], [], []

In [100]: for i in u:
    g = y[x == i]
    means.append(g.mean())
    stds.append(g.std())
    maxs.append(g.max())
    mins.append(g.min())
   .....:

which has the same effect than above, but is much faster.

Hope that helps,

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list