[Patches] [ python-Patches-864059 ] optimize eval_frame
SourceForge.net
noreply at sourceforge.net
Sun Mar 7 04:13:24 EST 2004
Patches item #864059, was opened at 2003-12-21 13:30
Message generated for change (Comment added) made by rhettinger
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=864059&group_id=5470
Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Neal Norwitz (nnorwitz)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: optimize eval_frame
Initial Comment:
There are several different parts to this patch which
are separable. They each seemed to have a small
benefit. It would be interesting for others to test
this patch in whole and in different parts to see if
speed can be improved. I generally got between 1% -
10% improvement. I used pystone, pybench, and the
total time to run all regression tests. Runs were on a
RH9 Linux/Athlon 650. I used a non-debug build (so gcc
3.2 with -O3). All regression tests pass with these
changes.
I removed registers from many variables. This seemed
to have little to no effect. So I'm not sure about
those. opcode does not need to be initialized to 0. I
removed the freevars variable since it is rarely used.
I think the largest benefit was from adding the gotos
for opcodes which set why: BREAK_LOOP, CONTINUE_LOOP,
RETURN_VALUE, YIELD_VALUE; This skips many tests which
are known a priori depending on the opcode.
I removed the special check for list in UNPACK_SEQUENCE
since this path is rarely used.
(http://coverage.livinglogic.de/file.phtml?file%5fid=12442339)
I also removed the predcitions for JUMP_IF_TRUE since
this wasn't executed often (see previous URL).
I added 2 opcodes for calling functions with 0 or 1
arguments. This removed a lot of code in
call_function(). By removing test branches in several
places, this seemed to speed up the code. However, it
seemed that just specializing for 0 arguments was
better than for 1 arg. I'm not sure if the
specialization for 1 argument provides much benefit.
Both of these specializations could possibly be
improved to speed things up.
----------------------------------------------------------------------
>Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-07 04:13
Message:
Logged In: YES
user_id=80475
Neal, assigning back to you in case you want to purse the
two new
opcodes.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2004-02-06 13:37
Message:
Logged In: YES
user_id=80475
Added a simplified version of the goto optimization.
See Python/ceval.c 2.374
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2003-12-31 22:42
Message:
Logged In: YES
user_id=80475
The patch is promising. I'm able to measure a small speed-
up for the two new function opcodes and for the setwhy
gotos. Both optimizations make sense.
I don't measure a savings from not initializing opcode and
oparg. That change makes sense conceptually because the
variables are always assigned before use; however, the
surrounding control flow statements hide that fact from the
compiler. So, it is likely that they were initialized to
suppress warnings on somebody's system. If so, then that
change should not be made.
The other stuff should definitely be left out. The effect of
register variables will vary from compiler to compiler, so if
you can't measure an improvement, it is best to leave it
alone. Some compilers do not do much in the way of
optimization and the register declaration may be a valuable
hint.
Please leave in the branch prediction for JUMP_IF_TRUE -- I
put it in after finding measurable savings in real code. While
it doesn't come up often, when it does it should run as fast
as possible.
The special case for UNPACK_SEQUENCE is up for grabs.
When that case occurs, the speedup is substantial. Also,
given that the tuple check has failed, it becomes highly
probable that the target is a list. OTOH, this inlined code
fattens the already voluminuous code for eval_frame. Maybe
eliminating it will help someone's optimizer cope with all the
code. Use your judgement on this one.
Removing the freevars variable did not show any speedup. It
does keep one variable off the stack and shortens the startup
time by a few instructions. OTOH, the in-lined replacements
for it result in a net expansion of code size and causes a
microscopic slowdown whenever it is used. I recommend
leaving this one alone.
Executive summary: Only make the two big changes that
show meaurable speedups and make conceptual sense.
Leave the other stuff alone.
One other thought, try making custom benchmarks for
targeted optimizations. The broad spectrum benchmarks are
too coarse to tell whether an improvement is really working.
Also, be sure to check with Guido before adding the new
opcodes.
Ideally, each optimization should be loaded separately so its
effects can be isolated and to allow any one to be backed out
if necessary.
----------------------------------------------------------------------
Comment By: Raymond Hettinger (rhettinger)
Date: 2003-12-24 03:20
Message:
Logged In: YES
user_id=80475
I'll try these out and review the patch when I get back from
vacation next week.
The list special case for UNPACK_SEQUENCE and the
prediction for JUMP_IF_TRUE should be left in -- they do
provide speed-ups for code that exercises those features and
they don't hurt the general cases.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=864059&group_id=5470
More information about the Patches
mailing list