Performance of int/long in Python 3

Mon Mar 25 17:51:07 EDT 2013

The Python 3 merge of int and long has effectively penalized
small-number arithmetic by removing an optimization. As we've seen
from PEP 393 strings (jmf aside), there can be huge benefits from
having a single type with multiple representations internally. Is
there value in making the int type have a machine-word optimization in
the same way?

The cost is clear. Compare these methods for calculating the sum of
all numbers up to 65535, which stays under 2^31:

def range_sum(n):
	return sum(range(n+1))

def forloop(n):
	tot=0
	for i in range(n+1):
		tot+=i
	return tot

def forloop_offset(n):
	tot=1000000000000000
	for i in range(n+1):
		tot+=i
	return tot-1000000000000000

import timeit
import sys
print(sys.version)
print("inline: %d"%sum(range(65536)))
print(timeit.timeit("sum(range(65536))",number=1000))
for func in ['range_sum','forloop','forloop_offset']:
	print("%s: %r"%(func,(globals()[func](65535))))
	print(timeit.timeit(func+"(65535)","from __main__ import "+func,number=1000))

Windows XP:
C:\>python26\python inttime.py
2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)]
inline: 2147450880
2.36770455463
range_sum: 2147450880
2.61778550067
forloop: 2147450880
7.91409131608
forloop_offset: 2147450880L
23.3116954809

C:\>python33\python inttime.py
3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)]
inline: 2147450880
5.25038713020789
range_sum: 2147450880
5.412975112758745
forloop: 2147450880
17.875799577879313
forloop_offset: 2147450880
19.31672544974291

Debian Wheezy:
rosuav at sikorsky:~$ python inttime.py
2.7.3 (default, Jan  2 2013, 13:56:14)
[GCC 4.7.2]
inline: 2147450880
1.92763710022
range_sum: 2147450880
1.93409109116
forloop: 2147450880
5.14633893967
forloop_offset: 2147450880
5.13459300995
rosuav at sikorsky:~$ python3 inttime.py
3.2.3 (default, Feb 20 2013, 14:44:27)
[GCC 4.7.2]
inline: 2147450880
2.884124994277954
range_sum: 2147450880
2.6586129665374756
forloop: 2147450880
7.660192012786865
forloop_offset: 2147450880
8.11817193031311

On 2.6/2.7, there's a massive penalty for switching to longs; on
3.2/3.3, the two for-loop versions are nearly identical in time.

(Side point: I'm often seeing that 3.2 on Linux is marginally faster
calling my range_sum function than doing the same thing inline. I do
not understand this. If anyone can explain what's going on there, I'm
all ears!)

Python 3's int is faster than Python 2's long, but slower than Python
2's int. So the question really is, would a two-form representation be
beneficial, and if so, is it worth the coding trouble?

ChrisA