"groupby" is brilliant!

vpr mvanaswegen at gmail.com
Tue Jun 13 03:24:29 EDT 2006


Hi Frank

This is one of the reasons why I love Python, you can write readable
code.
I strive to write clean code but I find that exception handling code
e.g. try:
makes my code ugly and significantly harder to read. Does anyone have
any good
pointers for a former C++ / Perl coder.

/vpr


Frank Millman wrote:
> Hi all
>
> This is probably old hat to most of you, but for me it was a
> revelation, so I thought I would share it in case someone has a similar
> requirement.
>
> I had to convert an old program that does a traditional pass through a
> sorted data file, breaking on a change of certain fields, processing
> each row, accumulating various totals, and doing additional processing
> at each break. I am not using a database for this one, as the file
> sizes are not large - a few thousand rows at most. I am using csv
> files, and using the csv module so that each row is nicely formatted
> into a list.
>
> The traditional approach is quite fiddly, saving the values of the
> various break fields, comparing the values on each row with the saved
> values, and taking action if the values differ. The more break fields
> there are, the fiddlier it gets.
>
> I was going to do the same in python, but then I vaguely remembered
> reading about 'groupby'. It took a little while to figure it out, but
> once I had cracked it, it transformed the task into one of utter
> simplicity.
>
> Here is an example. Imagine a transaction file sorted by branch,
> account number, and date, and you want to break on all three.
>
> -----------------------------
> import csv
> from itertools import groupby
> from operator import itemgetter
>
> BRN = 0
> ACC = 1
> DATE = 2
>
> reader = csv.reader(open('trans.csv', 'rb'))
> rows = []
> for row in reader:
>     rows.append(row)
>
> for brn,brnList in groupby(rows,itemgetter(BRN)):
>     for acc,accList in groupby(brnList,itemgetter(ACC)):
>         for date,dateList in groupby(accList,itemgetter(DATE)):
>             for row in dateList:
>                 [do something with row]
>             [do something on change of date]
>         [do something on change of acc]
>     [do something on change of brn]
> -----------------------------
> 
> Hope someone finds this of interest.
> 
> Frank Millman




More information about the Python-list mailing list