related lists mean value (golfed)

Tue Mar 9 13:24:27 EST 2010

Peter Otten wrote:
> Michael Rudolf wrote:
>
> > Am 09.03.2010 13:02, schrieb Peter Otten:
> >>>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y]
> >> [1.5, 1.5, 8.0, 4.0, 4.0, 4.0]
> >> Peter
> >
> > ... pwned.
> > Should be the fastest and shortest way to do it.
>
> It may be short, but it is not particularly efficient. A dict-based approach
> is probably the fastest. If y is guaranteed to be sorted itertools.groupby()
> may also be worth a try.
>
> $ cat tmp_average_compare.py
> from __future__ import division
> from collections import defaultdict
> try:
>     from itertools import izip as zip
> except ImportError:
>     pass
>
> x = [1 ,2, 8, 5, 0, 7]
> y = ['a', 'a', 'b', 'c', 'c', 'c' ]
>
> def f(x=x, y=y):
>     p = defaultdict(int)
>     q = defaultdict(int)
>     for a, b in zip(x, y):
>         p[b] += a
>         q[b] += 1
>     return [p[b]/q[b] for b in y]
>
> def g(x=x, y=y):
>     return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y]
>
> if __name__ == "__main__":
>     print(f())
>     print(g())
>     assert f() == g()
> $ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()'
> 100000 loops, best of 3: 11.4 usec per loop
> $ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()'
> 10000 loops, best of 3: 22.8 usec per loop
>
> Peter

I converged to the same solution but had an extra reduction step in
case there were a lot of repeats in the input. I think it is a good
compromise between efficiency, readability and succinctness.

x = [1 ,2, 8, 5, 0, 7]
y = ['a', 'a', 'b', 'c', 'c', 'c' ]
from collections import defaultdict
totdct = defaultdict(int)
cntdct = defaultdict(int)
for name, num in zip(y,x):
   totdct[name] += num
   cntdct[name] += 1
avgdct = {name : totdct[name]/cnts for name, cnts in cntdct.items()}
w = [avgdct[name] for name in y]