Newbie: finding the key/index of the min/max element
Armin Steinhoff
a-steinhoff at web.de
Sun May 5 12:54:24 EDT 2002
Tim Peters <tim.one at comcast.net> wrote in message news:<mailman.1020590831.7588.python-list at python.org>...
> [David Eppstein]
> > ...
>
Have someone tried to speed it up wit Pyrex ??
Armin
>
> # Find the rank'th-smallest value in a, in worst-case quadratic time.
> def short_find(a, rank):
> a.sort()
> return a[rank - 1]
>
> # Find the rank'th-smallest value in a, in worst-case linear time.
> def find(a, rank):
> n = len(a)
> assert 1 <= rank <= n
> if n <= 7:
> return short_find(a, rank)
>
> # Find median of median-of-7's.
> medians = [short_find(a[i : i+7], 4) for i in xrange(0, n-6, 7)]
> median = find(medians, (len(medians) + 1) // 2)
>
> # Partition around the median.
> # a[:i] <= median
> # a[j+1:] >= median
> i, j = 0, n-1
> while i <= j:
> while a[i] < median:
> i += 1
> while a[j] > median:
> j -= 1
> if i <= j:
> a[i], a[j] = a[j], a[i]
> i += 1
> j -= 1
>
> if rank <= i:
> return find(a[:i], rank)
> else:
> return find(a[i:], rank - i)
>
> def tryone(n, tries):
> from random import random
> from time import clock as now
>
> x = [None] * n
> median_rank = (n+1) // 2
> sum1 = sum2 = 0.0
> for i in range(tries):
> for i in xrange(n):
> x[i] = random()
> y = x[:]
>
> start = now()
> got1 = find(x, median_rank)
> elapsed = now() - start
> sum1 += elapsed
>
> start = now()
> got2 = short_find(y, median_rank)
> elapsed = now() - start
> sum2 += elapsed
>
> if got1 != got2:
> print "OUCH got1 %r got2 %r" % (got1, got2)
>
> return sum1, sum2
>
> def drive(tries):
> for i in range(23):
> n = 2**i + 1
> fast, slow = tryone(n, tries)
> print "%8d %7.3f %7.3f" % (n, fast/tries, slow/tries)
>
> if __name__ == "__main__":
> drive(3)
>
>
> PS: If you're more interested in expected time than worst-case time,
> replacing the median-of-median-of-7s business with
>
> median = short_find([a[0], a[-1], a[n//2]], 2)
>
> yields a Python median-finder that's usually significantly faster than the
> sort method starting in the range of 512K to 1M elements.
>
> PPS: If you do care about worst-case time, the Python version can be made
> significantly faster by boosting the median-of-7 gimmick to median-of-k for
> larger odd values of k. Fun for the whole family: plot speedups against k
> until you're all baffled <wink>.
More information about the Python-list
mailing list