Pivot Table/Groupby/Sum question
patrick.waldo at gmail.com
patrick.waldo at gmail.com
Wed Jan 2 09:33:00 EST 2008
Sorry for the delay in my response. New Year's Eve and moving
apartment
> - Where the data come from (I mean: are your data in Excel already
> when you get them)?
> - If your primary source of data is the Excel file, how do you read
> data from the Excel file to Python (I mean did you solve this part of the task already)?
Yes, the data comes from Excel and I use xlrd and PyExcelerator to
read and write, respectively.
#open for reading
path_file = "c:\\1\\data.xls"
book = xlrd.open_workbook(path_file)
Counts = book.sheet_by_index(1)
#get data
n=1
data = []
while n<Counts.nrows:
data.append(Counts.row_values(n, start_colx=0, end_colx=None))
n=n+1
#
#Heres the part where I need to do the pivot table
#
#open a new Excel file for writing
wb=pyExcelerator.Workbook()
matrix = wb.add_sheet("matrix")
wb.save('c:\\1\\matrix.xls')
So the data comes in as a long list. I'm dealing with some
information on various countries with 6 pieces of information to
pivot. Just to make it simple it's like a video store database. The
data is like [Country, Category, Sub Category, Film Title, Director,
Number of Copies]. data = [['Italy', 'Horror', '70s', 'Suspiria',
'Dario Argento', 4],['Italy', 'Classics', 'Neo-Realist', 'Otto e
Mezzo', 'Fellini', 3],['Italy', 'Horror', '70s', 'Profondo Rosso',
'Dario Argento', 4],...]. So there are 4 copies of Suspiria and 3 of
8 1/2. What I want is the total number of films for each country,
category and subcategory, ie there are 11 Italian films and 8 Italian
horror films from the 70s, etc...I will then output the data like this
| Horror | Classics ...
Total | 70s Slasher | Neo-Realist Western ...
Total
America 200 20 30 0 10 ...
Argentina 304 1 0 0 0 ...
....
Italy 11 7 0 3 0 ...
I'm just making up data here, but that's the idea. As I said, Excel
does it in it's own way, but that whole idea here is to organize it
exactly how I want it in an automated way. It's an interesting
problem, one that I think a lot of people might want to use. I'm
surprised that Programming Python, with all it's talk of dictionaries,
doesn't have a section on organizing tabular data like this, or I'm
more likely missing something or not making a connection.
Any ideas, hints, or pointers on where I could learn more about this
would be helpful. Otherwise, I'm off to the drawing board!
More information about the Python-list
mailing list