[issue18962] Add special case for single iterator in heapq.merge function
Wouter Bolsterlee
report at bugs.python.org
Sat Sep 7 20:26:28 CEST 2013
New submission from Wouter Bolsterlee:
The heapq.merge() function merges multiple sorted iterables into a single sorted output. The function uses a heap queue that is repeatedly looped over until it has generated all output.
If only a single iterable is passed to heapq.merge(), the heap will have len(h) == 1, which means the looping and heap manipulation just degrades to a convoluted "yield from iterables[0]". This negatively impacts performance. The attached patch adds a short-circuit for this single input case.
The behaviour of the function remains unchanged:
>>> list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
>>> list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
[0, 1, 2, 3, 4, 5, 5, 7, 8, 10, 15, 20, 25]
For multiple inputs, there is no measurable performance difference (measured
using IPython's %timeit). Without patch:
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.4 µs per loop
>>> %timeit list(merge_before([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.5 µs per loop
With patch:
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.7 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.6 µs per loop
>>> %timeit list(merge_after([1,3,5,7], [0,2,4,8], [5,10,15,20], [], [25]))
100000 loops, best of 3: 13.8 µs per loop
The real performance gain is of course when only a single iterable is passed. Without patch:
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.65 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.52 ms per loop
>>> %timeit for x in merge_before(range(10000)): pass
100 loops, best of 3: 2.51 ms per loop
With patch:
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 604 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 609 µs per loop
>>> %timeit for x in merge_after(range(10000)): pass
1000 loops, best of 3: 603 µs per loop
This is a ~4x speedup for an input consisting of 10000 items. Compared to plain iteration:
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 263 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 268 µs per loop
>>> %timeit for x in range(10000): pass
1000 loops, best of 3: 273 µs per loop
Without the patch, heapq.merge() is ~10x as slow as plain iteration. With the patch, this is reduced to ~2.5x.
(Note: all testing done using Python 3.3.2 on a 64bit Linux machine.)
----------
components: Library (Lib)
files: heapq-merge-single-input-optimization.patch
keywords: patch
messages: 197174
nosy: wbolster
priority: normal
severity: normal
status: open
title: Add special case for single iterator in heapq.merge function
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file31649/heapq-merge-single-input-optimization.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18962>
_______________________________________
More information about the Python-bugs-list
mailing list