[issue4356] Add "key" argument to "bisect" module functions

Rémi Lapeyre report at bugs.python.org
Mon Feb 18 10:13:33 EST 2019


Rémi Lapeyre <remi.lapeyre at henki.fr> added the comment:

Hi everybody, I opened PR 11781 to add a key argument to functions in the bisect
module. I agree with @dmtr's points that this addition is not a bad design.

As far as I can tell, the key function is at called at most once per item as this
example where an assertion would break shows:


    import bisect
    from collections import defaultdict


    class Test:
        def __init__(self, value):
            self.value = value


    cache = defaultdict(int)

    def key(e):
        cache[e] += 1
        assert cache[e] <= 1
        return e.value


    l = [Test(i) for i in range(10000)]

    bisect.bisect(l, Test(25), key=key)


    ➜  cpython git:(add-key-argument-to-bisect) ./python.exe
    Python 3.8.0a1+ (heads/add-key-argument-to-bisect:b7aaa1adad, Feb  7 2019, 17:33:24) 
    [Clang 10.0.0 (clang-1000.10.44.4)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import bisect
    >>> from collections import defaultdict
    >>> 
    >>> 
    >>> class Test:
    ...     def __init__(self, value):
    ...         self.value = value
    ... 
    >>> 
    >>> cache = defaultdict(int)
    >>> 
    >>> def key(e):
    ...     cache[e] += 1
    ...     assert cache[e] <= 1
    ...     return e.value
    ... 
    >>> 
    >>> l = [Test(i) for i in range(10000)]
    >>> 
    >>> bisect.bisect(l, Test(25), key=key)
    26


This argument can be used where the objects are immutable and I have not been able
to see changes in bisect speed using Victor Stinner perf module:

(cpython-venv) ➜  cpython git:(add-key-argument-to-bisect) ✗ make distclean && ./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
(cpython-venv) ➜  cpython git:(add-key-argument-to-bisect) ✗ git checkout cd90f6a369
(cpython-venv) ➜  cpython git:(cd90f6a369) ✗ make distclean && ./configure --with-pydebug --with-openssl=$(brew --prefix openssl) && make -sj && python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25)" -o $(git rev-parse --short HEAD).json
(cpython-venv) ➜  cpython git:(cd90f6a369) ✗ python -m perf compare_to cd90f6a369.json b7aaa1adad.json
Mean +- std dev: [cd90f6a369] 36.2 us +- 1.0 us -> [b7aaa1adad] 35.7 us +- 0.5 us: 1.01x faster (-1%)

(cd90f6a369 was somtime faster than b7aaa1adad, sometime slower but they always
were less than one std dev from one another)

As I wrote in the discussion of the PR, I suspect the branch predictor to predict
reliably the branching in the hot path (though I don't know much about that and
would love some input).


For the record, here is the performance when a key function is given:

(cpython-venv) ➜  cpython git:(add-key-argument-to-bisect) ✗ python -m perf timeit --rigorous -s "from bisect import bisect" "bisect(range(1_000_000_000_000_000), 25, key=lambda e: e)"

.........................................
Mean +- std dev: 59.3 us +- 1.0 us

It seems to me that adding the key parameter is the best solution possible.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue4356>
_______________________________________


More information about the Python-bugs-list mailing list