sum() vs. loop

Tue Oct 12 21:34:42 EDT 2021

Alan,

I am also wondering about that zip() function call to bind the two lists
into a sort of iterator object. Presumably that calls the iterator N times.
I did a test where I made two list called A and B and used zip to make an
object holding the two and then removed A and B. I was able to print the
list of tuples just fine with print(list(C)) so clearly it is not so much a
pure iterator as one that holds yet another copy of both lists!

Now the old-fashioned C way, might simply use old fashioned but highly
efficient pointer arithmetic to move through two lists or arrays or whatever
of the same length adding as it goes so one pass period. Or it could use
indexing as in sum += A[i] * B[i] ...

But yes, there are many good reasons to use clearer code with well-tested
parts even at the cost of some efficiency. 

Is there a version of zip() or an alternative that might be faster? In
particular, I was wondering the usual question of what happens if you might
abort early if something is noticed, like say a negative number. If you
pre-calculated and copied 100,000 items and abort after 5, ... 

I note an approach in some languages using a vectorized approach may be
faster. Forget lists and as long as A and B are equal length vectors or
arrays as in numpy, there are ways to ask to multiply two arrays in
vectorized fashion so effectively you can say something akin to:

sum(A*B)

The above may be highly efficient especially if the underlying code is in
C/C++.

If there is no objection in your code to using the numpy module and the
corresponding objects you can make a similar test with something like this:

And the function they seem to want is the dot product of two such arrays as
in numpy.dot(A, B) provides the sum of the many products of corresponding
entries in A and B.

-----Original Message-----
From: Python-list <python-list-bounces+avigross=verizon.net at python.org> On
Behalf Of Alan Gauld
Sent: Tuesday, October 12, 2021 6:56 PM
To: python-list at python.org
Subject: Fwd: Re: sum() vs. loop

On 10/10/2021 09:49, Steve Keller wrote:
> I have found the sum() function to be much slower than to loop over 
> the operands myself:
>
> def sum_products(seq1, seq2):
> return sum([a * b for a, b in zip(seq1, seq2)])
>
> def sum_products2(seq1, seq2):
> sum = 0
> for a, b in zip(seq1, seq2):
> sum += a * b
> return sum
>
> In a program I generate about 14 million pairs of sequences of ints 
> each of length 15 which need to be summed. The first version with 
> sum() needs
> 44 seconds while the second version runs in 37 seconds.
>
> Can someone explain this difference?

I'm no expert on Python innards but it looks to me like the first version
loops over the length of the list twice, once to generate the list of
products and the second time to add them together.

The pure Python version only loops once because it does the addition in
transit.

Presumably, especially for large numbers, a single python loop is faster
than 2 C loops?

But that's purely my speculation.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

-- 
https://mail.python.org/mailman/listinfo/python-list