Sorting by item_in_another_list
Cameron Walsh
cameron.walsh at gmail.com
Thu Oct 26 22:58:20 EDT 2006
Steven Bethard wrote:
>
> As you noted, you'll get an error if you try to concatenate B as a set
> to the list.
>
> Steve
Whoops, remind me to check things work before I type them. In the mean
time, here are some more interesting timing results:
With a larger data set, 500 elements instead of 100, the times were
almost the same:
python -m timeit -s "A=range(500); B=range(40,60,2); B_set=set(B)"
"A=B+list(x for x in A if x not in B_set)"
10000 loops, best of 3: 103 usec per loop
python -m timeit -s "A=range(500); B=range(40,60,2); B_set = set(B)"
"A.sort(key=B_set.__contains__, reverse=True)"
10000 loops, best of 3: 99.2 usec per loop
But for the last recommended one:
python -m timeit -s "A=range(500); B=range(40,60,2); C = set(A) -set(B)"
"A=B+filter(C.__contains__, A)"
10000 loops, best of 3: 38.3 usec per loop
Much faster! But that does not take into account the set creation and
set difference calculations. Let's try including that:
python -m timeit -s "A=range(500); B=range(40,60,2)" "C=set(A)-set(B)"
"a=B+filter(C.__contains__, A)"
10000 loops, best of 3: 105 usec per loop
and compare to our other two versions including set creation:
python -m timeit -s "A=range(500); B=range(40,60,2)" "B_set = set(B);
A=B+list(x for x in A if X not in B_set"
10000 loops, best of 3: 105 usec per loop
python -m timeit -s "A=range(500); B=range(40,60,2)" "B_set = set(B);
A.sort(key=B_set.__contains__, reverse=True)"
10000 loops, best of 3: 101 usec per loop
Pretty similar again.
However, converting each element to a string first ("A=list(str(x) for x
in range(500)" etc.), making it more similar to my data, gave:
102usec per loop for the A.sort() method
132usec per loop for the A=B+filter() method
104usec per loop for the A=B+list() method
It is interesting to note the increased time taken by the set
difference/filter method. It appears not to like strings as much as
ints. We can also shave another 7 usec off the filter time:
python -m timeit -s "A=list(str)x) for x in range(500)); B=list(str(x)
for x in range(40,60,2)); not_in_B_set=lambda x: x not in B_set"
"B_set=set(B); A=B+filter(not_in_B_set, A)"
10000 loops, best of 3: 125 usec per loop
In conclusion, it appears I should use the clearest of all the methods,
since for the data sizes I am dealing with, they are of approximately
the same speed, and the time taken is negligible anyway.
Thanks for all your help,
Cameron.
More information about the Python-list
mailing list