[issue43198] Operations on sets more than hundred times less efficient with python3.9 than with previous versions

Raymond Hettinger report at bugs.python.org
Thu Feb 11 21:45:54 EST 2021


Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:

Addenda.  Top use cases for sets (roughly in order of commonality):

1. Eliminate duplicates from an iterable input and loop over the result.
2. Store elements in a set just once but do many membership tests.
3. Perform data analysis on multiple sets using union, intersection, difference, and isdisjoint.
4. Remove elements from a set, one at a time, until it is empty.
5. Algorithms and that alternately add and remove different elements over time (toposort, NFA, DFA, etc).  

The first three are all penalized by an extra inner loop test and the loss of the register to track the freeslot.  Those use cases get no offsetting benefit.

Case 4 doesn't exercise the dummy reuse at all.  So, it is unaffected.

The last is approximately a net wash.  It pays the inner loop price, suffers the loss of a register, and incurs an extra test outside the loop, but it occasionally gets lucky and reuses a freeslot. The benefit for reuse is that it is delays the inevitable resize which would have reclaimed the dummy entries earlier. (The comparable use case for dicts is LRU/LFU caches where new entries are added at the same rate as old entries are removed.  That use case also showed a net wash when freeslot tracking was removed from dicts).

Not on the list at all is the case of a large set, where exactly the same element is added and removed in a tight loop.  The payoff for this case is that the O(n) resize step never occurs; however, this case is so rare that there is no reason to preference it over the common cases.  It now has the same performance as case 5.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43198>
_______________________________________


More information about the Python-bugs-list mailing list