[Python-ideas] [Python-Dev] minmax() function returning (minimum, maximum) tuple of a sequence

Guido van Rossum guido at python.org
Tue Oct 26 17:43:00 CEST 2010


On Tue, Oct 26, 2010 at 3:10 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> Perhaps I'm missing something, but to my mind, that's an awfully complicated
> solution for such a simple problem. Here's my attempt:
>
> def multi_reduce(iterable, funcs):
>    it = iter(iterable)
>    collectors = [next(it)]*len(funcs)
>    for i, f in enumerate(funcs):
>        x = next(it)
>        collectors[i] = f(collectors[i], x)
>    return collectors
>
> I've called it multi_reduce rather than parallel_reduce, because it doesn't
> execute the functions in parallel. By my testing on Python 3.1.1,
> multi_reduce is consistently ~30% faster than the generator based solution
> for lists with 1000 - 10,000,000 items.
>
> So what am I missing? What does your parallel_reduce give us that
> multi_reduce doesn't?

You're right, the point I wanted to prove was that generators are
better than threads, but the code was based on emulating reduce(). The
generalization that I was aiming for was that it is convenient to
write a generator that does some kind of computation over a sequence
of items and returns a result at the end, and then have a driver that
feeds a single sequence to a bunch such generators. This is more
powerful once you try to use reduce to compute e.g. the average of the
numbers fed to it -- of course you can do it using a function of
(state, value) but it is so much easier to write as a loop! (At least
for me -- if you do nothing but write Haskell all day I'm sure it
comes naturally. :-)

def avg():
  total = 0
  count = 0
  try:
    while True:
      value = yield
      total += value
      count += 1
  except GeneratorExit:
    raise StopIteration(total / count)

The essential boilerplate here is

  try:
    while True:
      value = yield
      <use value>
  except GeneratorExit:
    raise StopIteration(<compute result>)

No doubt functional aficionados will snub this, but in Python, this
should run much faster than the same thing written as a reduce-ready
function, due to the function overhead (which wasn't a problem in the
min/max example since those are built-ins).

BTW This episode led me to better understand my objection against
reduce() as the universal hammer: my problem with writing avg() using
reduce is that the function one feeds into reduce is asymmetric -- its
first argument must be some state, e.g. a tuple (total, count), and
the second argument must be the next value. This is the moment that my
head reliably explodes -- even though it has no problem visualizing
reduce() using a *symmetric* function like +, min or max.

Also note that the reduce() based solution would have to have a
separate function to extract the desired result (total / count) from
the state (total, count), and for multi_reduce() you would have to
supply a separate list of functions for these or some other hacky
approach.

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list