[Tutor] List comprehensions to search a list--amazing!

Peter Otten __peter__ at web.de
Thu Mar 19 10:52:51 CET 2015


Dave Angel wrote:

> On 03/19/2015 12:20 AM, boB Stepp wrote:
>> I hope extolling the beauty and power of Python on this list is
>> allowed, because I have had a large "WOW!!!" moment tonight. I had a
>> problem I was working on at work this afternoon. I have a list of ~
>> 10,000 floating point numbers, which run from largest to smallest.
>> There are duplicates scattered throughout, so I might have something
>> like: [5942.6789, 5942.6789, 5941.000003, 5941.01, 5941.01, ... ],
>> etc. I wanted to search the list for a test value, which, depending on
>> the particular list (I can have many such lists, each different from
>> the other.), could conceivably be anywhere within the given list. I
>> needed to return the index where the list values change from being
>> just greater than the test value to just less than the test value at
>> the very next index position. I spent a good chunk of my afternoon
>> writing a binary search function and wondering what theoretically the
>> optimum search algorithm would be, got interrupted (as usual on this
>> project), and decided to look at my books at home to see if a better
>> solution would be staring at me from some book (Like there usually
>> is!).
>>
>> I haven't studied list comprehensions formally yet, but a snippet of
>> code in a book caught my eye where the author was discussing filtering
>> data in a list. This led me to try:
>>
>> The generalized problem:
>>
>> L = [V0, V1, ..., Vn], where V0 >= V1 >= V2 >= ... >= Vn .
>> Find index i, such that V[i] >= Vt >= V[i + 1], where Vt is the test
>> value being searched for. I need to know the indices i and i + 1,
>> which I need to interpolate based on where Vt falls.
>>
>> The solution (As a sublist, S)  I worked out tonight after
>> experimenting with comprehension syntax is:
>> S = [i for i, V in enumerate(L) if L[i] >= Vt >= L[i + 1]]
>>
>> And, of course, the index i I need is:
>> i = S[0]
>>
>> I tested this out with concrete examples in the interpreter, such as
>> with a list, L:
>>
>> L = [item for item in range(10000, 0, -1)]
>>
>> and trying different test values. It was blazingly fast, too!
>>
>> All I can say is: WOW!!!

By the way, if you were to use a plain old loop the expected speedup over 
the listcomp would be 2. You can break out of the loop when you have found 
the gap, after iterating over one half of the list on average.

So for-loops are twice as amazing ;)

> That's very innovative.
> 
> The catch to a list comprehension is it has to visit all the elements,
> while a binary search would visit log-base-2 of them.  So instead of
> 10000 elements, you'd be searching about 14 items.
> 
> For large lists, it'd probably be much quicker to use the bisect module.
> https://docs.python.org/3.4/library/bisect.html
> 
> 
> Check out bisect.bisect_left() and bisect.bisect_right()
> 
> I don't see how to directly use those functions on a list which is
> reverse-sorted, but the source is available.  On my install, it's
> located at:
> 
> /usr/lib/python3.4/bisect.py
> 

To back Dave's suggestion with some empirical data here are two ways to make 
bisect() work with a descending list (though if possible I would recommend 
that you change your script to use ascending lists).

$ cat reverse_bisect2.py
import bisect
import random

def break_listcomp(L, Vt):
    S = [i for i, V in enumerate(L) if L[i] >= Vt >= L[i + 1]]
    return S[0]

def break_bisect_reverse(L, Vt):
    L.reverse()
    result = bisect.bisect(L, Vt)
    L.reverse()
    return len(L) - result -1

class Items:
    def __init__(self, items):
        self.items = items
    def __len__(self):
        return len(self.items)
    def __getitem__(self, index):
        return self.items[len(self.items) - 1 - index]

def break_bisect_virt(L, Vt):
    return len(L) - 1 - bisect.bisect(Items(L), Vt)

random.seed(42)
N = 10**6
data = [random.randrange(N) for i in range(10**5)]
data = data + data
data.sort(reverse=True)
sample = random.sample(data, 10)
$

break_bisect_reverse() reverses the list before and after applying bisect. 
This is still O(N), but the actual work is done in C.

break_bisect_virt() wraps the actual list in a class that translates

items[0] to items[len(items)-1]
items[1] to items[len(items)-2]

and so on, thus providing a reversed view of the list without moving any 
values. This severely slows down access to a single value, but as bisect 
needs much fewer lookups than the listcomp the overall result is still a 
massive speedup. The actual timings:

$ python3 -m timeit -s 'from reverse_bisect2 import data, sample, 
break_listcomp as f' '[f(data, v) for v in sample]'
10 loops, best of 3: 781 msec per loop

$ python3 -m timeit -s 'from reverse_bisect2 import data, sample, 
break_bisect_reverse as f' '[f(data, v) for v in sample]'
100 loops, best of 3: 15 msec per loop

$ python3 -m timeit -s 'from reverse_bisect2 import data, sample, 
break_bisect_virt as f' '[f(data, v) for v in sample]'
1000 loops, best of 3: 221 usec per loop

So reverse/bisect is 50 times faster than the listcomp, and
bisect/virt is 3500 times faster than the listcomp.

I expect that a prepackaged linear interpolation function from numpy/scipy 
can still do better, and also handle the corner cases correctly. To use such 
a function you may have to reverse order of the values.




More information about the Tutor mailing list