Pivot Table/Groupby/Sum question
petr.jakes.tpc at gmail.com
petr.jakes.tpc at gmail.com
Thu Jan 3 12:48:07 EST 2008
On Jan 3, 3:41 pm, patrick.wa... at gmail.com wrote:
> Yes in the sense that the top part will have merged cells so that
> Horror and Classics don't need to be repeated every time, but the
> headers aren't the important part. At this point I'm more interested
> in organizing the data itself and i can worry about putting it into a
> new excel file later.
What I am able to do is to use SQL, I think it is quite simple, about
50 rows of code including sample data and comments. It works for me
and IMHO it is easy tu understand and I think you can use as it is.
Otherwise you can control Excel from Python using win32api, win32com
( http://tinyurl.com/2m3x3v )
HTH
Petr Jakes
#!/usr/bin/env python
# -*- coding: cp1250 -*-
import sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()
normalizedInputData=[]
subCategories=[]
rawData = [['Italy', 'Horror', '70s', 'Suspiria','Dario Argento', 4],
['Italy', 'Classics', 'Neo-Realist', 'Otto eMezzo', 'Fellini',
3],
['GB', 'Classics', 'Neo-Humoristic', 'Otto eMezzo', 'Fellini',
3],
['Fr', 'Horror', 'Neo-Realist', 'Otto eMezzo', 'Fellini', 8],
['Fr', 'Classics', 'Neo-Realist', 'Otto eMezzo', 'Fellini',
55],
['GB', 'Horror', 'Neo-Realist', 'Otto eMezzo', 'Fellini', 5],
['Italy', 'Horror', '70s', 'Profondo Rosso','Dario Argento',
4]]
def alphanum(s):
"""only letters, numbers and '_' are acceptable for column names
by SQL"""
filtered=''
for ch in s:
if ch.isalnum() or ch in '_':
filtered+=ch
return filtered
for myRow in rawData :
cat_SubCat = alphanum("_".join(myRow[1:3]))
if cat_SubCat not in subCategories:
subCategories.append(cat_SubCat)
myRow[1:3] = [cat_SubCat]
normalizedInputData.append(myRow)
def data_generator(dataSet):
for dataSetRow in dataSet:
yield dataSetRow
subCategories=sorted(subCategories)
# create SQL table named "MOVIES" with the apropriate fields (the tabe
is store in the memory only)
cur.execute("create table MOVIES(COUNTRY, CATEGORY, TITLE, DIRECTOR,
QUANTITY)")
# fill the table with data
cur.executemany("""insert into MOVIES(COUNTRY, CATEGORY, TITLE,
DIRECTOR, QUANTITY) values (?,?,?,?,?)""",
data_generator(normalizedInputData))
# assemble dynamic SQL SELECT query, which returns PIVOT TABLE
prologue = "select COUNTRY, SUM(QUANTITY) AS TOTAL, "
template = "SUM (CASE CATEGORY WHEN '%s' THEN QUANTITY ELSE 0 END) %s"
epilogue = " FROM MOVIES GROUP BY 1 ORDER BY 1"
pivotSelect = prologue + ", ".join([template % (x, x) for x in
subCategories]) + epilogue
# execute SQL SELECT and return data row by row
cur.execute(pivotSelect)
for row in cur.fetchall():
print row
More information about the Python-list
mailing list