Index of maximum element in list

Sun Jan 27 06:32:12 EST 2008

On Jan 26, 10:07 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Sat, 26 Jan 2008 12:40:26 -0800, Paul Rubin wrote:
> > bearophileH... at lycos.com writes:
> >> > def posmax(seq, key=lambda x:x):
> >> >    return max(enumerate(seq), key=lambda k: key(k[1]))[0]
>
> >> Is the Python max able to tell that's the identity function? I don't
> >> think so, so your version may be slower and more memory hungry in the
> >> common situation where you don't have a key function. So I think my
> >> version is better :-)
>
> > I don't think there will be a noticable memory increase.  It's not a DSU
> > situation like in sorting, it's just a small constant parameter. Yes
> > there might be a small speed hit.  The compiler in principle could
> > convert the (lambda k: lambda x: k[1]) to something like
> > operator.itemgetter(1), but I doubt that it actually does so.
>
> Actually there is a big speed hit. Until such time that Python has a fast
> identity function, and lambda x:x isn't it, your version is about three
> times slower than Bearophile's.
>
> Much to my surprise, the fastest solution I've tried appears to be a pure
> Python version not even using max() at all.
>
> Here are the three versions:
>
> def posmax1(seq, key=None):  # bearophile's version
>     """Return the position of the first maximum item of a sequence.
>     It accepts the usual key parameter too."""
>     if key:
>         return max(enumerate(seq), key=lambda k: key(k[1]))[0]
>     else:
>         return max(enumerate(seq), key=itemgetter(1))[0]
>
> def posmax2(seq, key=lambda x:x):  # Paul Rubin's version
>     return max(enumerate(seq), key=lambda k: key(k[1]))[0]
>
> def posmax3(seq, key=None):  # Pure Python version
>     if key is None:
>         m = seq[0]
>         index = 0
>         for i, x in enumerate(seq):
>             if x > m:
>                 m = x
>                 index = i
>         return index
>     else:
>         return NotImplemented
>
> And my timing results:
>
> >>> alist = [3]*1000 + [5] + [3]*1000
> >>> assert alist.index(5) == 1000
> >>> assert posmax1(alist) == posmax2(alist) == posmax3(alist) == 1000
>
> >>> min(timeit.Timer('posmax1(alist)',
>
> ...     'from __main__ import posmax1, alist').repeat(number=1000))/1000
> 0.00074387979507446289>>> min(timeit.Timer('posmax2(alist)',
>
> ...     'from __main__ import posmax2, alist').repeat(number=1000))/1000
> 0.0022983889579772949>>> min(timeit.Timer('posmax3(alist)',
>
> ...     'from __main__ import posmax3, alist').repeat(number=1000))/1000
> 0.00063846302032470701
>
> --
> Steven

I don't think one should dismiss the simplest solution l.index(max(l))
out of hand (benchmark function borrowed from bearophile):

=================== posmax.py ==================
from operator import itemgetter

def simple_posmax(l, key=None):
    if key:
        return l.index(max(l, key=key))
    else:
        return l.index(max(l))

def clever_posmax(seq, key=None):
    if key:
        return max(enumerate(seq), key=lambda k: key(k[1]))[0]
    else:
        return max(enumerate(seq), key=itemgetter(1))[0]

def posmax_benchmark(posmax):
    from time import clock
    alist = [3]*1000 + [5] + [3]*1000

    t = clock()
    for _ in xrange(6000):
        r = posmax(alist)
    print posmax.__name__, round(clock() - t, 2), r

if __name__ == "__main__":
    posmax_benchmark(clever_posmax)
    posmax_benchmark(simple_posmax)
=============================================

marigold:python arno$ python posmax.py
clever_posmax 3.25 1000
simple_posmax 0.95 1000

simple_posmax is more than 3x faster on my machine.  It's not
surprising as even though the list is walked twice, it is all done in
C and no new objects have to be created. Then only non-C bit is when
the result of max(l) is fed to l.index().

--
Arnaud