[issue23119] Remove unicode specialization from set objects

Raymond Hettinger report at bugs.python.org
Sat Dec 27 11:50:11 CET 2014


New submission from Raymond Hettinger:

This tracker item is to record experiments with removing unicode specialization code from set objects and run timings to determine the performance benefits or losses from those specializations.

* Removes the set_lookkey_unicode() function and the attendant so->lookup indirections.  That saves 60 lines of code.  On each lookup, it saves one indirection for the lookup dispatch, but in the case of unicode only tables, it costs an additional indirection through the abstract API for PyObject_RichCompareBool.

* Removes the specialization code in add, discard, and contains functions to check for a unicode key with an already computed hash value.  This saves a type check (cheap), a hash field check, and a nine lines of code.  In the cast where the hash value would have already been computed, it costs a call to PyObject_Hash (which has an indirection, but otherwise does the same field test that we are doing).  The working hypothesis is that this specialization code saves only a little in cases where it applies and adds a little to all the cases where it does not apply.  (Note, the use cases for sets are less likely than dicts to be looking up strings whose hash value has already been computed.)

----------------------

Here are some initial timings for the first patch.  It seems to show that intersection benefits slightly and that set creation time is unaffected.

$ ./time_suite.sh 
100000 loops, best of 3: 14.9 usec per loop
100000 loops, best of 3: 15.3 usec per loop
1000000 loops, best of 3: 1.17 usec per loop
1000000 loops, best of 3: 1.13 usec per loop
10000 loops, best of 3: 24.9 usec per loop
10000 loops, best of 3: 24.2 usec per loop

$ ./time_suite.sh 
100000 loops, best of 3: 14.7 usec per loop
100000 loops, best of 3: 14.6 usec per loop
1000000 loops, best of 3: 1.16 usec per loop
1000000 loops, best of 3: 1.07 usec per loop
10000 loops, best of 3: 23.1 usec per loop
10000 loops, best of 3: 23.4 usec per loop

$ ./time_suite.sh 
100000 loops, best of 3: 14.5 usec per loop
100000 loops, best of 3: 14.5 usec per loop
1000000 loops, best of 3: 1.16 usec per loop
1000000 loops, best of 3: 1.17 usec per loop
10000 loops, best of 3: 22.5 usec per loop
10000 loops, best of 3: 22 usec per loop

----------
assignee: rhettinger
components: Interpreter Core
files: one_lookkey.diff
keywords: patch
messages: 233128
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Remove unicode specialization from set objects
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file37547/one_lookkey.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23119>
_______________________________________


More information about the Python-bugs-list mailing list