[Python-Dev] Who cares about the performance of these opcodes?

Tue Mar 9 08:59:52 EST 2004

At 07:38 AM 3/9/04 -0600, Jeff Epler wrote:
>Recently it was proposed to make a new LIST_APPEND opcode, and several
>contributors pointed out that adding opcodes to Python is always a dicey
>business because it may hurt performance for obscure reasons, possibly
>related to the size of that 'switch' statement.
>
>To that end, I notice that there are several opcodes which could easily
>be converted into function calls.  In my code, these are not typically
>performance-critical opcodes (with approximate ceval.c line count):
>     BUILD_CLASS             # 9 lines
>     MAKE_FUNCTION           # 20 lines
>     MAKE_CLOSURE            # 35 lines
>
>     PRINT_EXPR              # 21 lines
>     PRINT_ITEM              # 47 lines
>     PRINT_ITEM_TO           # 2 lines + fallthrough
>     PRINT_NEWLINE           # 12 lines
>     PRINT_NEWLINE_TO        # 2 lines + fallthrough
>
>Instead, each of these would be available in the code objects co_consts
>when necessary.  For example, instead of
>     LOAD_CONST               1 (<code object g at 0x40165ea0, file 
> "<stdin>", line 2>)
>     MAKE_FUNCTION            0
>     STORE_FAST               0 (g)
>you'd have
>     LOAD_CONST               1 (type 'function')
>     LOAD_CONST               2 (<code object g>)
>     LOAD_GLOBALS                                 # new opcode, or call 
> globals()
>     LOAD_CONST               1 ("g")
>     CALL_FUNCTION            3
>
>Performance for these specific operations will certainly benchmark worse,
>but maybe getting rid of something like 150 lines from ceval.c would
>help other things by magic.  The new LOAD_GLOBALS opcode would be less
>than 10 lines.
>
>No, I don't have a patch.  I assume each and every one of these opcodes
>has a staunch defender who will now come to its aid, and save me the
>trouble.

If the goal is to remove lines from the switch statement, just move the 
code of lesser-used opcodes into a C function.  There's no need to 
eliminate the opcodes themselves.

I personally don't think it'll help much, if the goal is to reduce cache 
misses.  After all, the code is all still there.  But, it should not do as 
badly as the approach you're suggesting, because for your case you'll not 
only have the C-level calls, but also more bytecodes being interpreted.

Hm.  Makes me wonder, actually, if a hand-written eval loop in assembly 
code might not kick some serious butt.  Or maybe a bytecode-to-assembly 
translator, writing loads in-line and using registers as the stack, calling 
functions where necessary.  Ah, if only I were a teenager again, with 
little need to sleep, and unlimited time to hack...  :)