maximum() efficency
Steve R. Hastings
steve at hastings.org
Sun Mar 26 12:26:37 EST 2006
I was looking at a Python function to find the maximum from a list.
The original was more complicated; I simplified it. The built-in max()
function can replace the simplified example, but not the original.
def maximum(lst):
maxval = lst[0]
for i in xrange(1, len(lst)):
v = lst[i]
if maxval > v:
maxval = v
return maxval
Immediately I started thinking about ways to make this more efficient. My
first thought was to use iterators:
def maximum(lst):
itr = iter(lst)
maxval = itr.next()
for v in itr:
if maxval > v:
maxval = v
return maxval
Then I thought, I wonder if there is a faster solution that *doesn't* use
iterators, for use with older versions of Python. I came up with:
def maximum(lst):
a = []
for v in lst:
try:
if a[0] > v:
a[0] = v
except IndexError:
a.append(v)
return a[0]
Of course, it's a little ugly to use a[0] instead of "maxval". And I'm
trying not to index a list, and here I am indexing another list. So:
def maximum(lst):
for v in lst:
try:
if maxval > v:
maxval = v
except NameError:
maxval = v
return maxval
And we come at last to my actual question: Is this guaranteed to work on
all versions of Python?
In my testing on my computer, the above function works perfectly. Even
if you do this:
>>> maxval = 100
>>> print maximum(cmp, [2, 0, 3, 1])
3
In other words, you get a NameError in the function even if there is a
global variable with that name. That's with Python 2.4, however... is
this function guaranteed to work in all Python versions? I am very
comfortable with the a[0] version, since I explicitly set a to an empty
list, I know a[0] will always raise that IndexError.
P.S. I went ahead and benchmarked the above functions, plus one more
that is similar to the a[0] solution but used an empty dictionary instead
of a zero-length list. I created a list of a million zeroes, and then
stored a 1 at the beginning of the list. Thus the maximum functions would
spend all their time iterating through values and comparing them, and very
little time updating maxval or its equivalent. Times to run each function
100 times on the large list:
36.8 seconds xrange
22.3 seconds iterator
39.2 seconds IndexError on a[0]
31.5 seconds NameError with maxval
43.4 seconds KeyError on empty dictionary
The conclusions I draw from these numbers:
The sneaky tricks with exceptions don't bring enough speed to be worth
using; two of them were actually slower than the xrange solution. Even
the NameError one was only a little bit faster, and simple is better than
complex, so I don't think I'd ever actually use it.
The clear winner was the iterator version. It was much faster than the
others, and in my opinion it is simpler and easier to understand than any
of the others.
--
Steve R. Hastings "Vita est"
steve at hastings.org http://www.blarg.net/~steveha
More information about the Python-list
mailing list