itertools.groupby usage to get structured data

Slafs slafs.e at gmail.com
Sat Feb 5 05:34:26 EST 2011


On 5 Lut, 05:58, Paul Rubin <no.em... at nospam.invalid> wrote:
> Slafs <slaf... at gmail.com> writes:
> > What i want to have is:
> > a "big" nested dictionary with 'g1' values as 1st level keys and a
> > dictionary of aggregates and "subgroups" in it....
>
> > I was looking for a solution that would let me do that kind of
> > grouping with variable lists of 2) and 3) i.e. having also 'g3' as
> > grouping element so the 'g2' dicts could also have their own
> > "subgroup" and be even more nested then.
> > I was trying something with itertools.groupby and updating nested
> > dicts, but as i was writing the code it started to feel too verbose to
> > me :/
>
> > Do You have any hints maybe? because i'm kind of stucked :/
>
> I'm not sure I understood the problem and it would help if you gave
> sample data with the deeper nesting that you describe.  But the
> following messy code matches the sample that you did give:
>
>     from pprint import pprint
>     from itertools import groupby
>
>     x1 = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
>               { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
>               { 'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}
>               ]
>     x2 = ['g1', 'g2']
>     x3 = ['s_v1', 's_v2']
>
>     def agg(xdata, group_keys, agg_keys):
>         if not group_keys:
>             return {}
>         k0, ks = group_keys[0], group_keys[1:]
>         r = {}
>         def gk(d): return d[k0]
>         for k, g in groupby(sorted(xdata, key=gk), gk):
>             gs = list(g)
>             aggs = dict((ak,sum(d[ak] for d in gs)) for ak in agg_keys)
>             r[k] = aggs
>             if ks:
>                 r[k][ks[0]] = agg(gs,group_keys[1:], agg_keys)
>         return r
>
>     pprint (agg(x1, x2, x3))

Thank you both Steven and Paul for your replies.

@Steven:
> Perhaps you should consider backing up and staring from somewhere else
> with different input data, or changing the requirements. Just a thought.

I think it's not the issue. The data as you noticed i well structured
(as a table for instance) and I don't think I can go better than that.

> I don't think groupby is the tool you want. It groups *consecutive* items
> in sequences:

I was using groupby just like in Paul's code.

@Paul:
OMG. I think this is it! (getting my jaw from the floor...)
The funny part is that I was kind of close to this solution ;). I was
considering the use of recursion for this.

Thank You so much!



More information about the Python-list mailing list