[Tutor] Binary search question
Steven D'Aprano
steve at pearwood.info
Sat Apr 24 04:04:42 CEST 2010
On Sat, 24 Apr 2010 09:41:05 am Alan Gauld wrote:
> "Emile van Sebille" <emile at fenx.com> wrote
>
> >> For completeness sake, on a 10000 item list, using the in operator
> >> takes *in the worst case* around 7 seconds.
> >
> > Well on my system checking for the last element of a 100k item list
> > returns true almost upon hitting the enter key. Surely 7 seconds
> > for a list 1/10th the size is a typo?
>
> Interestingly on my PC with Python 3.1:
> >>> data = list(range(10000000))
> >>> 9999999 in data
>
> True
>
> >>> -5 in data
>
> False
>
> takes an apparently constant, sub second time.
With respect Alan, timing operations by eye is pretty lame, what did you
do, count your heartbeats? :)
Python makes it so easy to do accurate timings quickly. At least as
accurate as they can be on a multi-tasking operating system that can
preemptively swap processes in and out of memory as needed.
I would expect that the difference between searching for 9999999 and -5
would be insignificant, because in both cases you have to walk the
entire list. And sure enough, both take about half a second on my PC:
>>> setup = "data = list(range(10000000))"
>>> from timeit import Timer
>>> min(Timer("9999999 in data", setup).repeat(number=100))/100
0.5429409599304199
>>> min(Timer("-5 in data", setup).repeat(number=100))/100
0.4803972291946411
I wouldn't treat the difference between 0.54 and 0.48 as significant in
this case. I have about a dozen other applications running, including
at least fifty pages open in web browsers, and I continued actively
working while the timing code was running, which is a big No-No if you
want accurate results.
In any case, if you search for something else, the speed is quite
different:
>>> min(Timer("5 in data", setup).repeat(number=100))/100
4.291534423828125e-07
[...]
> So 'in' looks pretty effective to me!
It is, at least for built-in lists. O(N) might not be amazingly fast,
but it is usually fast enough. It's O(N**2) operations that you want to
watch out for!
It is tempting to speed hours optimizing code to save twenty
milliseconds on a program that takes 200,000 milliseconds to run, but
there are better ways to spend your time as a programmer. Hence the
usual advice that premature optimization is the root of all evil. First
make it work, and then only if it's not fast enough, make it work
faster.
Of course, none of this is meant to say you shouldn't plan ahead and
choose the best data structure for the job!
--
Steven D'Aprano
More information about the Tutor
mailing list