"groupby" is brilliant!

Frank Millman frank at chagford.com
Tue Jun 13 02:29:11 EDT 2006


Hi all

This is probably old hat to most of you, but for me it was a
revelation, so I thought I would share it in case someone has a similar
requirement.

I had to convert an old program that does a traditional pass through a
sorted data file, breaking on a change of certain fields, processing
each row, accumulating various totals, and doing additional processing
at each break. I am not using a database for this one, as the file
sizes are not large - a few thousand rows at most. I am using csv
files, and using the csv module so that each row is nicely formatted
into a list.

The traditional approach is quite fiddly, saving the values of the
various break fields, comparing the values on each row with the saved
values, and taking action if the values differ. The more break fields
there are, the fiddlier it gets.

I was going to do the same in python, but then I vaguely remembered
reading about 'groupby'. It took a little while to figure it out, but
once I had cracked it, it transformed the task into one of utter
simplicity.

Here is an example. Imagine a transaction file sorted by branch,
account number, and date, and you want to break on all three.

-----------------------------
import csv
from itertools import groupby
from operator import itemgetter

BRN = 0
ACC = 1
DATE = 2

reader = csv.reader(open('trans.csv', 'rb'))
rows = []
for row in reader:
    rows.append(row)

for brn,brnList in groupby(rows,itemgetter(BRN)):
    for acc,accList in groupby(brnList,itemgetter(ACC)):
        for date,dateList in groupby(accList,itemgetter(DATE)):
            for row in dateList:
                [do something with row]
            [do something on change of date]
        [do something on change of acc]
    [do something on change of brn]
-----------------------------

Hope someone finds this of interest.

Frank Millman




More information about the Python-list mailing list