Performance of int/long in Python 3
Neil Hodgson
nhodgson at iinet.net.au
Wed Apr 3 18:58:09 EDT 2013
Neil Hodgson, replying to self:
> The assembler (32-bit build) for each
> PyUnicode_READ looks like
Don't have 64-bit MSVC 2010 set up but the code from 64-bit MSVC
2012 is better since there are an extra 8 registers in 64-bit mode:
; 10431: c1 = PyUnicode_READ(kind1, data1, i);
cmp rsi, 1
jne SHORT $LN17 at unicode_co
lea rax, QWORD PTR [r9+rcx]
movzx r8d, BYTE PTR [rax+rbx]
jmp SHORT $LN16 at unicode_co
$LN17 at unicode_co:
cmp rsi, 2
jne SHORT $LN15 at unicode_co
movzx r8d, WORD PTR [r9+r11]
jmp SHORT $LN16 at unicode_co
$LN15 at unicode_co:
mov r8d, DWORD PTR [r9+r10]
$LN16 at unicode_co:
All the variables used in the loop are now in registers but the
tests and branches are the same. This lines up with 64-bit being better
than 32-bit on Windows but not as good as Python 3.2 or Unix.
Neil
More information about the Python-list
mailing list