Pivot Table/Groupby/Sum question

patrick.waldo at gmail.com patrick.waldo at gmail.com
Wed Jan 2 09:33:00 EST 2008


Sorry for the delay in my response.  New Year's Eve and moving
apartment

> - Where the data come from (I mean: are your data in Excel already
> when you get them)?
> - If your primary source of data is the Excel file, how do you read
> data from the Excel file to Python (I mean did you solve this part of the task already)?

Yes, the data comes from Excel and I use xlrd and PyExcelerator to
read and write, respectively.
#open for reading
path_file = "c:\\1\\data.xls"
book = xlrd.open_workbook(path_file)
Counts = book.sheet_by_index(1)
#get data
n=1
data = []
while n<Counts.nrows:
    data.append(Counts.row_values(n, start_colx=0, end_colx=None))
    n=n+1
#
#Heres the part where I need to do the pivot table
#
#open a new Excel file for writing
wb=pyExcelerator.Workbook()
matrix = wb.add_sheet("matrix")
wb.save('c:\\1\\matrix.xls')


So the data comes in as a long list.  I'm dealing with some
information on various countries with 6 pieces of information to
pivot.  Just to make it simple it's like a video store database.  The
data is like [Country, Category, Sub Category, Film Title, Director,
Number of Copies].  data = [['Italy', 'Horror', '70s', 'Suspiria',
'Dario Argento', 4],['Italy', 'Classics', 'Neo-Realist', 'Otto e
Mezzo', 'Fellini', 3],['Italy', 'Horror', '70s', 'Profondo Rosso',
'Dario Argento', 4],...].  So there are 4 copies of Suspiria and 3 of
8 1/2.  What I want is the total number of films for each country,
category and subcategory, ie there are 11 Italian films and 8 Italian
horror films from the 70s, etc...I will then output the data like this
                 | Horror            |  Classics              ...
           Total | 70s    Slasher    |  Neo-Realist  Western  ...
Total
America    200     20       30            0            10     ...
Argentina  304     1        0             0            0      ...
....
Italy      11      7        0             3            0      ...

I'm just making up data here, but that's the idea.  As I said, Excel
does it in it's own way, but that whole idea here is to organize it
exactly how I want it in an automated way.  It's an interesting
problem, one that I think a lot of people might want to use. I'm
surprised that Programming Python, with all it's talk of dictionaries,
doesn't have a section on organizing tabular data like this, or I'm
more likely missing something or not making a connection.

Any ideas, hints, or pointers on where I could learn more about this
would be helpful.  Otherwise, I'm off to the drawing board!



More information about the Python-list mailing list