[Tutor] Binary search question

Steven D'Aprano steve at pearwood.info
Sat Apr 24 04:04:42 CEST 2010


On Sat, 24 Apr 2010 09:41:05 am Alan Gauld wrote:
> "Emile van Sebille" <emile at fenx.com> wrote
>
> >> For completeness sake, on a 10000 item list, using the in operator
> >> takes *in the worst case* around 7 seconds.
> >
> > Well on my system checking for the last element of a 100k item list
> > returns true almost upon hitting the enter key.  Surely 7 seconds
> > for a list 1/10th the size is a typo?
>
> Interestingly on my PC with Python 3.1:
> >>> data = list(range(10000000))
> >>> 9999999 in data
>
> True
>
> >>> -5 in data
>
> False
>
> takes an apparently constant, sub second time.

With respect Alan, timing operations by eye is pretty lame, what did you 
do, count your heartbeats? :)

Python makes it so easy to do accurate timings quickly. At least as 
accurate as they can be on a multi-tasking operating system that can 
preemptively swap processes in and out of memory as needed.

I would expect that the difference between searching for 9999999 and -5 
would be insignificant, because in both cases you have to walk the 
entire list. And sure enough, both take about half a second on my PC:


>>> setup = "data = list(range(10000000))"
>>> from timeit import Timer
>>> min(Timer("9999999 in data", setup).repeat(number=100))/100
0.5429409599304199
>>> min(Timer("-5 in data", setup).repeat(number=100))/100
0.4803972291946411

I wouldn't treat the difference between 0.54 and 0.48 as significant in 
this case. I have about a dozen other applications running, including 
at least fifty pages open in web browsers, and I continued actively 
working while the timing code was running, which is a big No-No if you 
want accurate results.

In any case, if you search for something else, the speed is quite 
different:

>>> min(Timer("5 in data", setup).repeat(number=100))/100
4.291534423828125e-07


[...]
> So 'in' looks pretty effective to me!

It is, at least for built-in lists. O(N) might not be amazingly fast, 
but it is usually fast enough. It's O(N**2) operations that you want to 
watch out for!

It is tempting to speed hours optimizing code to save twenty 
milliseconds on a program that takes 200,000 milliseconds to run, but 
there are better ways to spend your time as a programmer. Hence the 
usual advice that premature optimization is the root of all evil. First 
make it work, and then only if it's not fast enough, make it work 
faster.

Of course, none of this is meant to say you shouldn't plan ahead and 
choose the best data structure for the job!


-- 
Steven D'Aprano


More information about the Tutor mailing list