itertools.groupby usage to get structured data

Sat Feb 5 07:12:36 EST 2011

Slafs wrote:

> Hi there!
> 
> I'm having trouble to wrap my brain around this kind of problem:
> 
> What I have :
>   1) list of dicts
>   2) list of keys that i would like to be my grouping arguments of
> elements from 1)
>   3) list of keys that i would like do "aggregation" on the elements
> of 1) with some function e.g. sum
> 
> For instance i got:
> 1) [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
>       { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
>       {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}, ... ]
> 2) ['g1', 'g2']
> 3) ['s_v1', 's_v2']
> 
> To be precise 1) is a result of a values_list method from a QuerySet
> in Django; 2) is the arguments for that method; 3) those are the
> annotation keys. so 1) is a result of:
>    qs.values_list('g1', 'g2').annotate(s_v1=Sum('v1'), s_v2=Sum('v2'))
> 
> What i want to have is:
> a "big" nested dictionary with 'g1' values as 1st level keys and a
> dictionary of aggregates and "subgroups" in it.
> 
> In my example it would be something like this:
> {
>   1 : {
>           's_v1' : 7.0,
>           's_v2' : 6.5,
>           'g2' :{
>                    8 : {
>                           's_v1' : 5.0,
>                           's_v2' : 3.5 },
>                    9 :  {
>                           's_v1' : 2.0,
>                           's_v2' : 3.0 }
>                 }
>        },
>   2 : {
>            's_v1' : 6.0,
>            's_v2' : 8.0,
>            'g2' : {
>                     8 : {
>                           's_v1' : 6.0,
>                           's_v2' : 8.0}
>            }
>        },
> ...
> }
> 
> # notice the summed values of s_v1 and s_v2 when g1 == 1
> 
> I was looking for a solution that would let me do that kind of
> grouping with variable lists of 2) and 3) i.e. having also 'g3' as
> grouping element so the 'g2' dicts could also have their own
> "subgroup" and be even more nested then.
> I was trying something with itertools.groupby and updating nested
> dicts, but as i was writing the code it started to feel too verbose to
> me :/
> 
> Do You have any hints maybe? because i'm kind of stucked :/
> 
> Regards
> 
> Sławek

Not super-efficient, but simple:

$ cat python sumover.py
cat: python: No such file or directory
data = [ { 'g1' : 1, 'g2' : 8, 's_v1' : 5.0, 's_v2' : 3.5 },
         { 'g1' : 1, 'g2' : 9, 's_v1' : 2.0, 's_v2' : 3.0 },
         {'g1' : 2, 'g2' : 8, 's_v1' : 6.0, 's_v2' : 8.0}]
sum_over = ["s_v1", "s_v2"]
group_by = ["g1", "g2"]

wanted = {
  1 : {
          's_v1' : 7.0,
          's_v2' : 6.5,
          'g2' :{
                   8 : {
                          's_v1' : 5.0,
                          's_v2' : 3.5 },
                   9 :  {
                          's_v1' : 2.0,
                          's_v2' : 3.0 }
                }
       },
  2 : {
           's_v1' : 6.0,
           's_v2' : 8.0,
           'g2' : {
                    8 : {
                          's_v1' : 6.0,
                          's_v2' : 8.0}
           }
       },
}

def calc(data, group_by, sum_over):
    tree = {}
    group_by = group_by + [None]
    for item in data:
        d = tree
        for g in group_by:
            for so in sum_over:
                d[so] = d.get(so, 0.0) + item[so]
            if g:
                d = d.setdefault(g, {}).setdefault(item[g], {})
    return tree

got = calc(data, group_by, sum_over)[group_by[0]]
assert got == wanted
$ python sumover.py
$

Untested.