Performance of int/long in Python 3

Neil Hodgson nhodgson at iinet.net.au
Wed Apr 3 18:58:09 EDT 2013


Neil Hodgson, replying to self:

> The assembler (32-bit build) for each
> PyUnicode_READ looks like

    Don't have 64-bit MSVC 2010 set up but the code from 64-bit MSVC 
2012 is better since there are an extra 8 registers in 64-bit mode:

; 10431:         c1 = PyUnicode_READ(kind1, data1, i);
	cmp	rsi, 1
	jne	SHORT $LN17 at unicode_co
	lea	rax, QWORD PTR [r9+rcx]
	movzx	r8d, BYTE PTR [rax+rbx]
	jmp	SHORT $LN16 at unicode_co
$LN17 at unicode_co:
	cmp	rsi, 2
	jne	SHORT $LN15 at unicode_co
	movzx	r8d, WORD PTR [r9+r11]
	jmp	SHORT $LN16 at unicode_co
$LN15 at unicode_co:
	mov	r8d, DWORD PTR [r9+r10]
$LN16 at unicode_co:

    All the variables used in the loop are now in registers but the 
tests and branches are the same. This lines up with 64-bit being better 
than 32-bit on Windows but not as good as Python 3.2 or Unix.

    Neil



More information about the Python-list mailing list