range() is not the best way to check range?

Simon Forman rogue_pedro at yahoo.com
Tue Jul 18 16:28:29 EDT 2006


K.S.Sreeram wrote:
> Simon Forman wrote:
> > Nick Craig-Wood wrote:
> >> Sets are pretty fast too, and have the advantage of flexibility in
> >> that you can put any numbers in you like
> >>
> >
> > I know this is self-evident to most of the people reading this, but I
> > thought it worth pointing out that this is a great way to test
> > membership in range(lo, hi, step) without doing "the necessary
> > algebra".
> >
> > i.e.  n in set(xrange(0, 10000, 23)) ...
>
> This is very very misleading... here are some timings :

Yes it is. I'm sorry about that.

> python -mtimeit "n=5000" "n in set(xrange(0,10000))"
> 1000 loops, best of 3: 1.32 msec per loop
>
> python -mtimeit "n=5000" "n in xrange(0,10000)"
> 1000 loops, best of 3: 455 usec per loop
>
> python -mtimeit "n=5000" "0 <= n < 10000"
> 1000000 loops, best of 3: 0.217 usec per loop
>
> sets are fast only if you create them *once* and use them again and
> again. even in that case, the sets use up O(n) memory.

That's what I meant.  But I didn't state it clearly.

One of the things I like most about python is that it allows you to
specify the problem that you want to solve without a great deal of
difficulty as to *how* to specify it.  To me, and perhaps others, "T =
set(xrange(0, 10000, 23))" and "n in T"  are somewhat easier to read
and write than "not n % 23 and 0 <= n < 10000", YMMV.

In the given case a set of ~(10000 / 23) ints would not usually be too
burdensome on ram, and the code runs close to the same speed as
compared to the direct calculation:

from timeit import Timer

times = 100000
Max = 10000
n = 5000
T = set(xrange(0, Max, 23))

s1 = 'n in T'
s2 = 'not n %% 23 and 0 <= n < %s' % Max

setup = 'from __main__ import n, T'


S1 = Timer(s1, setup).repeat(number=times)
S2 = Timer(s2, setup).repeat(number=times)


print "%.3f usec/pass" % (1000000 * min(S1) / times)
print "%.3f usec/pass" % (1000000 * min(S2) / times)

On my machine this printed:
0.476 usec/pass
0.552 usec/pass


>
> with comparison operators, you don't need extra memory *and* there is no
> pre-computation required.

When I set Max = 100000000 in the above test code there was serious
disk thrashing...  ;-)

>
> [sreeram;]
>


FWIW, in production code I would certainly use the comparison
operators.  A kilobyte saved is a kilobyte earned.

Peace,
~Simon




More information about the Python-list mailing list