Summing a 2D list
Maric Michaud
maric at aristote.info
Fri Jun 13 11:13:25 EDT 2008
Le Friday 13 June 2008 14:12:40 Karsten Heymann, vous avez écrit :
> Hi Mark,
>
> Mark <markjturner at gmail.com> writes:
> > I have a scenario where I have a list like this:
> >
> > User Score
> > 1 0
> > 1 1
> > 1 5
> > 2 3
> > 2 1
> > 3 2
> > 4 3
> > 4 3
> > 4 2
> >
> > And I need to add up the score for each user to get something like
> > this:
> >
> > User Score
> > 1 6
> > 2 4
> > 3 2
> > 4 8
> >
> > Is this possible? If so, how can I do it? I've tried looping through
> > the arrays and not had much luck so far.
>
> Although your problem has already been solved, I'd like to present a
> different approach which can be quite a bit faster. The most common
> approach seems to be using a dictionary:
>
> summed_up={}
> for user,vote in pairs:
> if summed_up.has_key(user):
> summed_up[user]+=vote
> else:
> summed_up[user]=vote
>
> But if the list of users is compact and the maximum value is known
> before, the using a list and coding the user into the list position is
> much more elegant:
>
So, writing C in python, which has dictionnary as builtin type, should be
considered "more elegant" ?
> summed_up=list( (0,) * max_user )
> for user,vote in pairs:
> summed_up[user] += vote
>
> I've run a quick and dirty test on these approaches and found that the
> latter takes only half the time than the first. More precisely, with
> about 2 million pairs, i got:
>
> * dict approach: 2s
> (4s with "try: ... except KeyError:" instead of the "if")
> * list approach: 0.9s
>
You are comparing apples with lemons, there is no such a difference between
list index access and dictionnary key access in Python.
> BTW this was inspired by the book "Programming Pearls" I read some
> years ago where a similar approach saved some magnitudes of time
>(using a bit field instead of a list to store reserved/free phone
> numbers IIRC).
If you know in advance the number and names of users, what prevent you to
initialize completelly the target dictionnary ?
The following code compare the same algorithm, once with list and the second
time with dict :
#!/usr/bin/env python
def do(f, u, v) :
from time import time
n=time()
f(u, v)
return time() -n
def c_dict(users, votes) :
d = dict(((e, 0) for e in users))
for u, v in votes : d[u] += v
return d.values()
def c_list(users, votes) :
d = [ 0 for i in users ]
for u, v in votes : d[u] += v
return d
u = range(3000)
import random
v = list((u[r%3000], random.randint(0,10000)) for r in range(5*10**6))
print "with list", do(c_list, u, v)
print "with dict", do(c_dict, u, v)
The result is pretty close now :
maric at redflag1 17:04:36:~$ ./test.py
with list 1.40726399422
with dict 1.63094091415
So why use list where the obvious and natural data structure is a
dictionnary ?
--
_____________
Maric Michaud
More information about the Python-list
mailing list