[issue28509] dict.update allocates too much
Serhiy Storchaka
report at bugs.python.org
Tue Oct 25 17:10:17 EDT 2016
Serhiy Storchaka added the comment:
LGTM.
Here is a script that produces more compact output. The first column is a size after which the dict is resized.
import sys
p = 10000
b = {}
for i in range(10000):
a = {}
b[i] = i
a.update(b)
s = sys.getsizeof(a)
if s > p:
print(i, p)
p = s
Unpatched:
5 128
7 196
15 344
31 628
63 1208
127 2612
255 5176
511 10292
1023 20536
2047 41012
4095 81976
8191 163892
Patched:
5 128
10 196
21 344
42 628
85 1208
170 2612
341 5176
682 10292
1365 20536
2730 41012
5461 81976
But I suggest instead the condition
mp->ma_keys->dk_usable < other->ma_used
use the condition
mp->ma_used + mp->ma_keys->dk_usable < other->ma_used
If there are overlapping keys this can allow to avoid resizing. In worst keys one resizing will be happened in dictinsert().
Yes one estimation is:
USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_used
Dict size is at least doubled after resizing. No need to make preliminary resizing if the final result is the same. The benefit is that if there are many overlapping keys the final size can be ESTIMATE_SIZE(other->ma_used) instead of ESTIMATE_SIZE(mp->ma_used + other->ma_used).
All this conditions can be combined (because they have different computational cost):
mp->ma_keys->dk_usable < other->ma_used &&
mp->ma_used + mp->ma_keys->dk_usable < other->ma_used &&
USABLE_FRACTION(2 * mp->ma_keys->dk_size) < mp->ma_used + other->ma_used
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28509>
_______________________________________
More information about the Python-bugs-list
mailing list