[issue23119] Remove unicode specialization from set objects

Fri Jan 9 10:04:45 CET 2015

Marc-Andre Lemburg added the comment:

On 09.01.2015 09:33, Raymond Hettinger wrote:
> 
> I'm withdrawing this one. After more work trying many timings on multiple compilers and various sizes and kinds of datasets, it appears that the unicode specialization is still worth it.  
> 
> The cost of the lookup indirection appears to be completely insignificant (i.e. doesn't harm the non-unicode case) while the benefits of the unicode specialized lookup does have measurable benefits in the use case of deduping an iterable of strings.

Thanks, Raymond, for the additional testing :-)

I did a grep over the Python C source code and it seems that sets are
only used by Python/symtable.c for anything mildly performance
relevant (which IIRC is used by the byte code compiler) -
and those sets have Unicode strings as members.

The stdlib uses sets with both Unicode strings and integers
as members. From looking at the grep hits, it seems that Unicode
strings are more commonly used than integers in the stdlib
as set members, e.g. for method names, module names and character
sets.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue23119>
_______________________________________