Summing a 2D list

Fri Jun 13 11:13:25 EDT 2008

Le Friday 13 June 2008 14:12:40 Karsten Heymann, vous avez écrit :
> Hi Mark,
>
> Mark <markjturner at gmail.com> writes:
> > I have a scenario where I have a list like this:
> >
> > User            Score
> > 1                 0
> > 1                 1
> > 1                 5
> > 2                 3
> > 2                 1
> > 3                 2
> > 4                 3
> > 4                 3
> > 4                 2
> >
> > And I need to add up the score for each user to get something like
> > this:
> >
> > User            Score
> > 1                 6
> > 2                 4
> > 3                 2
> > 4                 8
> >
> > Is this possible? If so, how can I do it? I've tried looping through
> > the arrays and not had much luck so far.
>
> Although your problem has already been solved, I'd like to present a
> different approach which can be quite a bit faster. The most common
> approach seems to be using a dictionary:
>
> summed_up={}
> for user,vote in pairs:
>   if summed_up.has_key(user):
>     summed_up[user]+=vote
>   else:
>     summed_up[user]=vote
>
> But if the list of users is compact and the maximum value is known
> before, the using a list and coding the user into the list position is
> much more elegant:
>

So, writing C in python, which has dictionnary as builtin type, should be 
considered "more elegant" ?

> summed_up=list( (0,) * max_user )
> for user,vote in pairs:
>   summed_up[user] += vote
>
> I've run a quick and dirty test on these approaches and found that the
> latter takes only half the time than the first. More precisely, with
> about 2 million pairs, i got:
>

> * dict approach: 2s
>         (4s with "try: ... except KeyError:" instead of the "if")
> * list approach: 0.9s
>

You are comparing apples with lemons, there is no such a difference between 
list index access and dictionnary key access in Python.

> BTW this was inspired by the book "Programming Pearls" I read some
> years ago where a similar approach saved some magnitudes of time
>(using a bit field instead of a list to store reserved/free phone
> numbers IIRC).

If you know in advance the number and names of users, what prevent you to 
initialize completelly the target dictionnary ?

The following code compare the same algorithm, once with list and the second 
time with dict :

#!/usr/bin/env python

def do(f, u, v) :
    from time import time
    n=time()
    f(u, v)
    return time() -n

def c_dict(users, votes) :
    d = dict(((e, 0) for e in users))
    for u, v in votes : d[u] += v
    return d.values()

def c_list(users, votes) :
    d = [ 0 for i in users ]
    for u, v in votes : d[u] += v
    return d

u = range(3000)

import random

v = list((u[r%3000], random.randint(0,10000)) for r in range(5*10**6))

print "with list", do(c_list, u, v)
print "with dict", do(c_dict, u, v)

The result is pretty close now :

maric at redflag1 17:04:36:~$ ./test.py
with list 1.40726399422
with dict 1.63094091415

So why use list where the obvious and natural data structure is a 
dictionnary ?

-- 
_____________

Maric Michaud