[New-bugs-announce] [issue40889] Symmetric difference on dict_views is inefficient

Raymond Hettinger report at bugs.python.org
Sat Jun 6 13:09:59 EDT 2020


New submission from Raymond Hettinger <raymond.hettinger at gmail.com>:

Running "d1.items() ^ d2.items()" will rehash every key and value in both dictionaries regardless of how much they overlap.

By taking advantage of the known hashes, the analysis step could avoid making any calls to __hash__().  Only the result tuples would need to hashed.

Currently the code below calls hash for every key and value on the left and for every key and value on the right:

  >>> left = {1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7}
  >>> right = {1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 8: -8, 9: -9}
  >>> left.items() ^ right.items()        # Total work: 28 __hash__() calls
  {(6, -6), (7, -7), (8, -8), (9, -9)}

Compare that with the workload for set symmetric difference which makes zero calls to __hash__():

  >>> set(left) ^ set(right)
  {6, 7, 8, 9}

FWIW, I do have an important use case where this matters.

----------
components: Interpreter Core
messages: 370839
nosy: rhettinger, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Symmetric difference on dict_views is inefficient
type: performance
versions: Python 3.10

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue40889>
_______________________________________


More information about the New-bugs-announce mailing list