sum() vs. loop

Tue Oct 12 21:52:02 EDT 2021

On Wed, Oct 13, 2021 at 12:36 PM Avi Gross via Python-list
<python-list at python.org> wrote:
>
> Alan,
>
> I am also wondering about that zip() function call to bind the two lists
> into a sort of iterator object. Presumably that calls the iterator N times.
> I did a test where I made two list called A and B and used zip to make an
> object holding the two and then removed A and B. I was able to print the
> list of tuples just fine with print(list(C)) so clearly it is not so much a
> pure iterator as one that holds yet another copy of both lists!

What do you mean by "removed" here? Simply removing the name that
refers to it doesn't destroy the list; an iterator will keep the list
alive.

> Now the old-fashioned C way, might simply use old fashioned but highly
> efficient pointer arithmetic to move through two lists or arrays or whatever
> of the same length adding as it goes so one pass period. Or it could use
> indexing as in sum += A[i] * B[i] ...

It's basically that, yup. Of course, it's not defined by C behaviour,
but the effect is the same.

> Is there a version of zip() or an alternative that might be faster? In
> particular, I was wondering the usual question of what happens if you might
> abort early if something is noticed, like say a negative number. If you
> pre-calculated and copied 100,000 items and abort after 5, ...

The purpose of zip is to represent the concept of iterating over two
things at once. Aborting early is fine, just as it is with other
iterations. It's incredibly convenient to be able to write something
like:

for tradegood, value in zip(data.tradegoods, data.tradevalues):
    ...

(Not exact code, but something very similar to what I've done while
parsing game savefiles.)

> I note an approach in some languages using a vectorized approach may be
> faster. Forget lists and as long as A and B are equal length vectors or
> arrays as in numpy, there are ways to ask to multiply two arrays in
> vectorized fashion so effectively you can say something akin to:
>
> sum(A*B)
>
> The above may be highly efficient especially if the underlying code is in
> C/C++.

Or Fortran :)

> If there is no objection in your code to using the numpy module and the
> corresponding objects you can make a similar test with something like this:
>
> And the function they seem to want is the dot product of two such arrays as
> in numpy.dot(A, B) provides the sum of the many products of corresponding
> entries in A and B.

If you're worried about the performance of summing pairwise products
of numbers, you should probably be using numpy already. It's
intellectually entertaining to explore the performance implications of
various types of iteration, but numpy is basically always going to win
any race that involves large numbers of floats :) It's like having a
hundred-yard-dash involving a bunch of schoolchildren, and one Olympic
runner - which of the schoolkids was fastest?

ChrisA