[Numpy-discussion] How to implement a 'pivot table?'
Vincent
vincent.nijs at gmail.com
Fri Aug 3 01:57:06 EDT 2007
What is ugly about the module? I like it!
What do you mean about recarray's? Do you think they are they not
appropriate for this type of thing?
When i get some time i'll run some tests versus SAS for the same
operations and do a speed comparison.
Question: Would there be an easy way to merge the summary stats back
into the recarray?
Best,
Vincent
On Aug 1, 11:22 pm, Travis Vaught <tra... at enthought.com> wrote:
> Greetings,
>
> Speaking of brute force... I've attached a rather ugly module that
> let's you do things with a pretty simple interface (session shown
> below). I haven't fully tested the performance, but a million
> records with 5 fields takes about 11 seconds on my Mac to do a
> 'mean'. I'm not sure what your performance considerations are, but
> this may be useful. Record arrays are really nice if they make sense
> for your data.
>
> Travis
>
> (from an ipython command prompt)
>
> In [1]: import testpivot as p
>
> In [2]: a = p.sample_data()
>
> In [3]: a
> Out[3]:
> recarray([('ACorp', 'Region 1', 'Q1', 20000.0),
> ('ACorp', 'Region 1', 'Q2', 22000.0),
> ('ACorp', 'Region 1', 'Q3', 21000.0),
> ('ACorp', 'Region 1', 'Q4', 26000.0),
> ('ACorp', 'Region 2', 'Q1', 23000.0),
> ('ACorp', 'Region 2', 'Q2', 20000.0),
> ('ACorp', 'Region 2', 'Q3', 22000.0),
> ('ACorp', 'Region 2', 'Q4', 21000.0),
> ('ACorp', 'Region 3', 'Q1', 26000.0),
> ('ACorp', 'Region 3', 'Q2', 23000.0),
> ('ACorp', 'Region 3', 'Q3', 29000.0),
> ('ACorp', 'Region 3', 'Q4', 27000.0),
> ('BCorp', 'Region 1', 'Q1', 20000.0),
> ('BCorp', 'Region 1', 'Q2', 20000.0),
> ('BCorp', 'Region 1', 'Q3', 24000.0),
> ('BCorp', 'Region 1', 'Q4', 24000.0),
> ('BCorp', 'Region 2', 'Q1', 21000.0),
> ('BCorp', 'Region 2', 'Q2', 21000.0),
> ('BCorp', 'Region 2', 'Q3', 22000.0),
> ('BCorp', 'Region 2', 'Q4', 29000.0),
> ('BCorp', 'Region 3', 'Q1', 28000.0),
> ('BCorp', 'Region 3', 'Q2', 25000.0),
> ('BCorp', 'Region 3', 'Q3', 22000.0),
> ('BCorp', 'Region 3', 'Q4', 21000.0)],
> dtype=[('company', '|S5'), ('region', '|S8'), ('quarter', '|
> S2'), ('income', '<f8')])
>
> In [4]: p.pivot(a, 'company', 'region', 'income', p.psum)
> ######## Summary by company and region ##########
> cols:['ACorp' 'BCorp']
> rows:['Region 1' 'Region 2' 'Region 3']
> [[ 89000. 88000.]
> [ 86000. 93000.]
> [ 105000. 96000.]]
>
> In [5]: p.pivot(a, 'company', 'quarter', 'income', p.psum)
> ######## Summary by company and quarter ##########
> cols:['ACorp' 'BCorp']
> rows:['Q1' 'Q2' 'Q3' 'Q4']
> [[ 69000. 69000.]
> [ 65000. 66000.]
> [ 72000. 68000.]
> [ 74000. 74000.]]
>
> In [6]: p.pivot(a, 'company', 'quarter', 'income', p.pmean)
> ######## Summary by company and quarter ##########
> cols:['ACorp' 'BCorp']
> rows:['Q1' 'Q2' 'Q3' 'Q4']
> [[ 23000. 23000. ]
> [ 21666.66666667 22000. ]
> [ 24000. 22666.66666667]
> [ 24666.66666667 24666.66666667]]
>
> testpivot.py
> 3KDownload
>
>
>
> On Aug 1, 2007, at 2:02 PM, Bruce Southey wrote:
>
> > Hi,
> > The hard part is knowing what aggregate function that you want. So a
> > hard way, even after cheating, to take the data provided is given
> > below. (The Numpy Example List was very useful especially on the where
> > function)!
>
> > I tried to be a little generic so you can replace the sum by any
> > suitable function and probably the array type as well. Of course it is
> > not complete because you still need to know the levels of the 'rows'
> > and 'columns' and also is not efficient as it has loops.
>
> > Bruce
>
> > from numpy import *
> > A=array([[1,1,10],
> > [1,1,20],
> > [1,2,30],
> > [2,1,40],
> > [2,2,50],
> > [2,2,60] ])
> > C = zeros((2,2))
>
> > for i in range(2):
> > crit1 = (A[:,0]==1+i)
> > subA=A[crit1,1:]
> > for j in range(2):
> > crit2 = (subA[:,0]==1+j)
> > subB=subA[crit2,1:]
> > C[i,j]=subB.sum()
>
> > print C
>
> > On 7/30/07, Geoffrey Zhu <zyzhu2... at gmail.com> wrote:
> >> Hi Everyone,
>
> >> I am wondering what is the best (and fast) way to build a pivot table
> >> aside from the 'brute force way?'
>
> >> I want to transform an numpy array into a pivot table. For
> >> example, if
> >> I have a numpy array like below:
>
> >> Region Date # of Units
> >> ---------- ---------- --------------
> >> East 1/1 10
> >> East 1/1 20
> >> East 1/2 30
> >> West 1/1 40
> >> West 1/2 50
> >> West 1/2 60
>
> >> I want to transform this into the following table, where f() is a
> >> given aggregate function:
>
> >> Date
> >> Region 1/1 1/2
> >> ----------
> >> East f(10,20) f(30)
> >> West f(40) f(50,60)
>
> >> I can regroup them into 'sets' and do it the brute force way, but
> >> that
> >> is kind of slow to execute. Does anyone know a better way?
>
> >> Thanks,
> >> Geoffrey
> >> _______________________________________________
> >> Numpy-discussion mailing list
> >> Numpy-discuss... at scipy.org
> >>http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
> > _______________________________________________
> > Numpy-discussion mailing list
> > Numpy-discuss... at scipy.org
> >http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discuss... at scipy.orghttp://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list