maximum() efficency

Sun Mar 26 12:26:37 EST 2006

I was looking at a Python function to find the maximum from a list. 
The original was more complicated; I simplified it.  The built-in max()
function can replace the simplified example, but not the original.

def maximum(lst):
  maxval = lst[0]

  for i in xrange(1, len(lst)):
    v = lst[i]
    if maxval > v:
      maxval = v

  return maxval

Immediately I started thinking about ways to make this more efficient.  My
first thought was to use iterators:

def maximum(lst):
  itr = iter(lst)
  maxval = itr.next()

  for v in itr:
    if maxval > v:
      maxval = v

  return maxval

Then I thought, I wonder if there is a faster solution that *doesn't* use
iterators, for use with older versions of Python.  I came up with:

def maximum(lst):
  a = []
  for v in lst:
    try:
      if a[0] > v:
        a[0] = v
    except IndexError:
      a.append(v)
  return a[0]

Of course, it's a little ugly to use a[0] instead of "maxval".  And I'm
trying not to index a list, and here I am indexing another list.  So:

def maximum(lst):
  for v in lst:
    try:
      if maxval > v:
        maxval = v
    except NameError:
      maxval = v
  return maxval

And we come at last to my actual question:  Is this guaranteed to work on
all versions of Python?

In my testing on my computer, the above function works perfectly.  Even
if you do this:

>>> maxval = 100
>>> print maximum(cmp, [2, 0, 3, 1])
3

In other words, you get a NameError in the function even if there is a
global variable with that name.  That's with Python 2.4, however... is
this function guaranteed to work in all Python versions?  I am very
comfortable with the a[0] version, since I explicitly set a to an empty
list, I know a[0] will always raise that IndexError.

P.S. I went ahead and benchmarked the above functions, plus one more
that is similar to the a[0] solution but used an empty dictionary instead
of a zero-length list.  I created a list of a million zeroes, and then
stored a 1 at the beginning of the list.  Thus the maximum functions would
spend all their time iterating through values and comparing them, and very
little time updating maxval or its equivalent.  Times to run each function
100 times on the large list:

36.8 seconds  xrange
22.3 seconds  iterator
39.2 seconds  IndexError on a[0]
31.5 seconds  NameError with maxval
43.4 seconds  KeyError on empty dictionary

The conclusions I draw from these numbers:

The sneaky tricks with exceptions don't bring enough speed to be worth
using; two of them were actually slower than the xrange solution.  Even
the NameError one was only a little bit faster, and simple is better than
complex, so I don't think I'd ever actually use it.

The clear winner was the iterator version. It was much faster than the
others, and in my opinion it is simpler and easier to understand than any
of the others.
-- 
Steve R. Hastings    "Vita est"
steve at hastings.org    http://www.blarg.net/~steveha