Drowning in a teacup?
Peter Otten
__peter__ at web.de
Sat Apr 2 06:31:30 EDT 2016
Vito De Tullio wrote:
> Michael Selik wrote:
>
>>> > I need to scan a list of strings. If one of the elements matches the
>>> > beginning of a search keyword, that element needs to snap to the front
>>> > of the list.
>>>
>>> I know this post regards the function passing, but, on you specific
>>> problem,
>>> can't you just ... sort the list with a custom key?
>>
>> If the number of matches is small relative to the size of the list, I'd
>> expect the sort would be slower than most of the other suggestions.
>
> umm... I take the liberty to set up a little test
>
> $ cat test_sort_prefix.py
> #!/usr/bin/env python3
>
> from sys import argv
> from timeit import timeit
>
>
> def sort_in_place(l):
> def key(e):
> return not e.startswith('yes')
> l.sort(key=key)
>
>
> def original_solution(mylist):
> for i in range(len(mylist)):
> if(mylist[i].startswith('yes')):
> mylist[:] = [mylist[i]] + mylist[:i] + mylist[i+1:]
original_solution() reverses the order of the matches while sort_in_place()
doesn't. If the order is not relevant I'd try something like
def exchange(items):
pos = 0
for index, item in enumerate(items):
if item.startswith("yes"):
items[pos], items[index] = item, items[pos]
pos += 1
> def main():
> if argv[1] == 'sort_in_place':
> f = sort_in_place
> elif argv[1] == 'original_solution':
> f = original_solution
> else:
> raise Exception()
>
> nomatches = int(argv[2])
> matches = int(argv[3])
>
> l = ["no_" for _ in range(nomatches)] + ["yes_" for _ in
> range(matches)]
Python's timsort knows how to deal with (partially) sorted lists. I suggest
that you add
random.seed(42) # same list for all script invocations
random.shuffle(l)
here to spread the matches somewhat over the list.
> print(timeit("f(l)", number=100, globals={"f": f, "l": l}))
Remember that f(l) modifies l, so all but the first invocation will see a
list where all matches are at the beginning. In other words: in 99 out of
100 runs the list is already sorted.
While adding the overhead of copying the list I would expect the results of
timeit("f(l[:])", ...)
to be more realistic.
> if __name__ == '__main__':
> main()
> $ ./test_sort_prefix.py sort_in_place 1000000 3
> 33.260575089996564
> $ ./test_sort_prefix.py original_solution 1000000 3
> 35.93009542999789
> $
>
>
> in my tests there is no sensible difference between the two solutions...
> and the number of matches is risible :)
Indeed, as the number of matches rises your sort() approach will likely fare
better.
Your conclusions may be correct, but the benchmark to support them is
flawed.
More information about the Python-list
mailing list