groupby

Sat May 27 12:52:20 EDT 2006

"Paul McGuire" <ptmcg at austin.rr._bogus_.com> wrote in message
news:bzxcg.53765$Qq.42913 at tornado.texas.rr.com...
> So here's how to save the values from the iterators while iterating over
the
> groupby:
>
> >>> m = [(x,list(y)) for x,y in groupby([1, 1, 1, 2, 2, 3])]
> >>> m
> [(1, [1, 1, 1]), (2, [2, 2]), (3, [3])]
>
> -- Paul
>
>

Playing some more with groupby.  Here's a one-liner to tally a list of
integers into a histogram:

# create data set, random selection of numbers from 1-10
dataValueRange = range(1,11)
data = [random.choice(dataValueRange) for i in xrange(10)]
print data

# tally values into histogram:
# (from the inside out:
# - sort data into ascending order, so groupby will see all like values
together
# - call groupby, return iterator of (value,valueItemIterator) tuples
# - tally groupby results into a dict of (value, valueFrequency) tuples
# - expand dict into histogram list, filling in zeroes for any keys that
didn't get a value
hist = [ (k1,dict((k,len(list(g))) for k,g in
itertools.groupby(sorted(data))).get(k1,0)) for k1 in dataValueRange ]

print hist

Gives:
[9, 6, 8, 3, 2, 3, 10, 7, 6, 2]
[(1, 0), (2, 2), (3, 2), (4, 0), (5, 0), (6, 2), (7, 1), (8, 1), (9, 1),
(10, 1)]

Change the generation of the original data list to 10,000 values, and you
get something like:
[(1, 995), (2, 986), (3, 941), (4, 998), (5, 978), (6, 1007), (7, 997), (8,
1033), (9, 1038), (10, 1027)]

If you know there wont be any zero frequency values (or don't care about
them), you can skip the fill-in-the-zeros step, with one of these
expressions:
histAsList = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]
histAsDict = dict((k,len(list(g))) for k,g in
itertools.groupby(sorted(data)))

-- Paul