[issue31484] Cache single-character strings outside of the Latin1 range

Sun Sep 17 12:10:50 EDT 2017

Xiang Zhang added the comment:

I run the patch against a toy NLP application, cutting words from Shui Hu Zhuan provided by Serhiy. The result is not bad, 6% faster. And I also count the hit rate, 90% hit cell 0， 4.5 hit cell 1, 5.5% miss. I also increase the cache size to 1024 * 2. Although the hit rate increases to 95.4%, 2.1%, 2.4%, it's still 6% difference.

So IMHO this patch could hardly affect that *much* real-world applications, better or worse. I couldn't recall clearly the implementation of unicode but why can't we reuse the latin1 cache when we use this bmp cache? And then to avoid the chars' low bits conflicting with ASCII chars' low bits we have to introduce the mini-LRU-cache, which is not that easily understandable.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue31484>
_______________________________________