[issue16001] small ints: cache string representation

Tue Sep 25 23:59:02 CEST 2012

Terry J. Reedy added the comment:

Small int caching saves both time and space. On a nearly fresh IDLE session:
>>> sys.getrefcount(0)
772
>>> sys.getrefcount(1)
854
>>> sum(sys.getrefcount(i) for i in range(-5, 257))
4878

While an interesting idea, I do not see the same gain here, and agree with Martin.

Array lookup *is* faster than string conversion:
>>> ti.repeat(setup = "ar = [str(i) for i in range(101)]", stmt = "ar[100]")
[0.058166605132757995, 0.03438449234832762, 0.034402937150259674]
>>> ti.repeat(setup = "S = str", stmt = 'S(100)')
[0.21833603908330446, 0.19469564386039195, 0.1947128590088596]

but
1: converting ints to decimal digits is nearly always done for output,
and conversion is blazingly fast compared to output, so output time will dominate.

>>> ti.repeat(setup = "S = str", stmt = 'S(100)', number = 20)
[1.0144641009901534e-05, 8.914987631669646e-06, 8.914987574826228e-06]
>>> ti.repeat(setup = "p = print", stmt = 'p(100)', number = 20)
...
[0.11873041968999587, 0.039060557051357137, 0.03859697769621562]

2. I presume the conversion of 0 - 9 to '0' - '9' within the conversion routines is already optimized. I don't see that 10 - 259 should be more common that 257 - 999, let alone more common than all higher ints. So the limited optimization can have only limited effect.

3. Much production numerical output is float or decimal rather than int. The 3.3 optimization of ascii-only strings to bytes helped here.

----------
nosy: +terry.reedy

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16001>
_______________________________________