Pivot Table/Groupby/Sum question

Thu Jan 3 12:48:07 EST 2008

On Jan 3, 3:41 pm, patrick.wa... at gmail.com wrote:
> Yes in the sense that the top part will have merged cells so that
> Horror and Classics don't need to be repeated every time, but the
> headers aren't the important part.  At this point I'm more interested
> in organizing the data itself and i can worry about putting it into a
> new excel file later.

What I am able to do is to use SQL, I think it is quite simple, about
50 rows of code including sample data and comments. It works for me
and IMHO it is easy tu understand and I think you can use as it is.
Otherwise you can control Excel from Python using win32api, win32com
( http://tinyurl.com/2m3x3v )

HTH

Petr Jakes

#!/usr/bin/env python
# -*- coding: cp1250 -*-

import sqlite3
con = sqlite3.connect(":memory:")
cur = con.cursor()

normalizedInputData=[]
subCategories=[]

rawData = [['Italy', 'Horror', '70s', 'Suspiria','Dario Argento', 4],
        ['Italy', 'Classics', 'Neo-Realist', 'Otto eMezzo', 'Fellini',
3],
        ['GB', 'Classics', 'Neo-Humoristic', 'Otto eMezzo', 'Fellini',
3],
        ['Fr', 'Horror', 'Neo-Realist', 'Otto eMezzo', 'Fellini', 8],
        ['Fr', 'Classics', 'Neo-Realist', 'Otto eMezzo', 'Fellini',
55],
        ['GB', 'Horror', 'Neo-Realist', 'Otto eMezzo', 'Fellini', 5],
        ['Italy', 'Horror', '70s', 'Profondo Rosso','Dario Argento',
4]]

def alphanum(s):
    """only letters, numbers and '_' are acceptable for column names
by SQL"""
    filtered=''
    for ch in s:
        if ch.isalnum() or ch in '_':
            filtered+=ch
    return filtered

for myRow in rawData :
    cat_SubCat = alphanum("_".join(myRow[1:3]))
    if cat_SubCat not in subCategories:
        subCategories.append(cat_SubCat)
    myRow[1:3] = [cat_SubCat]
    normalizedInputData.append(myRow)

def data_generator(dataSet):
    for dataSetRow in dataSet:
         yield dataSetRow

subCategories=sorted(subCategories)

# create SQL table named "MOVIES" with the apropriate fields (the tabe
is store in the memory only)
cur.execute("create table MOVIES(COUNTRY, CATEGORY, TITLE, DIRECTOR,
QUANTITY)")

# fill the table with data
cur.executemany("""insert into MOVIES(COUNTRY, CATEGORY, TITLE,
DIRECTOR, QUANTITY) values (?,?,?,?,?)""",
data_generator(normalizedInputData))

# assemble dynamic SQL SELECT query, which returns PIVOT TABLE
prologue = "select COUNTRY, SUM(QUANTITY) AS TOTAL, "
template = "SUM (CASE CATEGORY WHEN '%s' THEN QUANTITY ELSE 0 END) %s"
epilogue = " FROM MOVIES GROUP BY 1 ORDER BY 1"
pivotSelect = prologue + ", ".join([template % (x, x) for x in
subCategories]) + epilogue

# execute SQL SELECT and return data row by row
cur.execute(pivotSelect)
for row in cur.fetchall():
    print row