[Tutor] List comprehensions to search a list--amazing!

Tue Mar 24 02:55:46 CET 2015

On Thu, Mar 19, 2015 at 4:52 AM, Peter Otten <__peter__ at web.de> wrote:
> Dave Angel wrote:

[...]
> By the way, if you were to use a plain old loop the expected speedup over
> the listcomp would be 2. You can break out of the loop when you have found
> the gap, after iterating over one half of the list on average.
>
> So for-loops are twice as amazing ;)

Actually, this was my first approach to solving the problem.

>> The catch to a list comprehension is it has to visit all the elements,
>> while a binary search would visit log-base-2 of them.  So instead of
>> 10000 elements, you'd be searching about 14 items.
>>
>> For large lists, it'd probably be much quicker to use the bisect module.
>> https://docs.python.org/3.4/library/bisect.html
>>
>>
>> Check out bisect.bisect_left() and bisect.bisect_right()
>>
>> I don't see how to directly use those functions on a list which is
>> reverse-sorted, but the source is available.  On my install, it's
>> located at:
>>
>> /usr/lib/python3.4/bisect.py
>>
>
> To back Dave's suggestion with some empirical data here are two ways to make
> bisect() work with a descending list (though if possible I would recommend
> that you change your script to use ascending lists).

I could easily do this, though the data naturally presents itself as I
stated originally.

> $ cat reverse_bisect2.py
> import bisect
> import random
>
> def break_listcomp(L, Vt):
>     S = [i for i, V in enumerate(L) if L[i] >= Vt >= L[i + 1]]
>     return S[0]
>
> def break_bisect_reverse(L, Vt):
>     L.reverse()
>     result = bisect.bisect(L, Vt)
>     L.reverse()
>     return len(L) - result -1
>
> class Items:
>     def __init__(self, items):
>         self.items = items
>     def __len__(self):
>         return len(self.items)
>     def __getitem__(self, index):
>         return self.items[len(self.items) - 1 - index]
>
> def break_bisect_virt(L, Vt):
>     return len(L) - 1 - bisect.bisect(Items(L), Vt)
>
> random.seed(42)
> N = 10**6
> data = [random.randrange(N) for i in range(10**5)]
> data = data + data
> data.sort(reverse=True)
> sample = random.sample(data, 10)
> $
>
> break_bisect_reverse() reverses the list before and after applying bisect.
> This is still O(N), but the actual work is done in C.
>
> break_bisect_virt() wraps the actual list in a class that translates
>
> items[0] to items[len(items)-1]
> items[1] to items[len(items)-2]

Thank you for taking time to write this. I may have questions later.

> and so on, thus providing a reversed view of the list without moving any
> values. This severely slows down access to a single value, but as bisect
> needs much fewer lookups than the listcomp the overall result is still a
> massive speedup. The actual timings:
>
> $ python3 -m timeit -s 'from reverse_bisect2 import data, sample,
> break_listcomp as f' '[f(data, v) for v in sample]'
> 10 loops, best of 3: 781 msec per loop
>
> $ python3 -m timeit -s 'from reverse_bisect2 import data, sample,
> break_bisect_reverse as f' '[f(data, v) for v in sample]'
> 100 loops, best of 3: 15 msec per loop
>
> $ python3 -m timeit -s 'from reverse_bisect2 import data, sample,
> break_bisect_virt as f' '[f(data, v) for v in sample]'
> 1000 loops, best of 3: 221 usec per loop
>
> So reverse/bisect is 50 times faster than the listcomp, and
> bisect/virt is 3500 times faster than the listcomp.

You present a compelling case!

> I expect that a prepackaged linear interpolation function from numpy/scipy
> can still do better, and also handle the corner cases correctly. To use such
> a function you may have to reverse order of the values.

This is not an option for me as I would not be allowed to install numpy/scipy.

-- 
boB