[Python-Dev] Reordering opcodes (PEP 203 Augmented Assignment)

M.-A. Lemburg mal@lemburg.com
Fri, 28 Jul 2000 18:07:21 +0200


"Eric S. Raymond" wrote:
> 
> M.-A. Lemburg <mal@lemburg.com>:
> >            LOAD_FAST(124) :   19323126 ================================
> >           SET_LINENO(127) :   15055591 ========================
> >           LOAD_CONST(100) :    9254683 ===============
> >            LOAD_NAME(101) :    8218954 =============
> >          LOAD_GLOBAL(116) :    7174876 ===========
> >           STORE_FAST(125) :    5927769 =========
> >              POP_TOP(  1) :    5587424 =========
> >        CALL_FUNCTION(131) :    5404709 ========
> >        JUMP_IF_FALSE(111) :    5289262 ========
> >           COMPARE_OP(106) :    4495179 =======
> >            LOAD_ATTR(105) :    3481878 =====
> >           BINARY_ADD( 23) :    3420811 =====
> >         RETURN_VALUE( 83) :    2221212 ===
> >           STORE_NAME( 90) :    2176228 ===
> >           STORE_ATTR( 95) :    2085338 ===
> >        BINARY_SUBSCR( 25) :    1834612 ===
> >        JUMP_ABSOLUTE(113) :    1648327 ==
> >         STORE_SUBSCR( 60) :    1446307 ==
> >         JUMP_FORWARD(110) :    1014821 =
> >      BINARY_SUBTRACT( 24) :     910085 =
> >            POP_BLOCK( 87) :     806160 =
> >         STORE_GLOBAL( 97) :     779880 =
> >             FOR_LOOP(114) :     735245 =
> >           SETUP_LOOP(120) :     657432 =
> >        BINARY_MODULO( 22) :     610121 =
> >                   32( 32) :     530811
> >                   31( 31) :     530657
> >      BINARY_MULTIPLY( 20) :     392274
> >         SETUP_EXCEPT(121) :     285523
> 
> Some thoughts:
> 
> 1. That looks as close to a Poisson distribution as makes no difference.
>    I wonder what that means?

I'd say that there are good chances on applying optimizations
to the Python byte code -- someone with enough VC should look
into this on a serious basis ;-)

I think that highly optimized Python byte code compilers/
interpreters would make nice commercial products which
complement the targetted Python+Batteries distros.
 
> 2. Microtuning in the implementations of the top 3 opcodes looks indicated,
>    as they seem to constitute more than 50% of all calls.

Separating out LOAD_FAST from the switch shows a nice effect.
SET_LINENO is removed by -OO anyway, so there's really no
use in optimizing this one.
 
In my hacked up version I've also moved the signal handler
into the second switch (along with SET_LINENO). The downside
of this is that your program will only "see" signals if
it happens to execute one of the less common opcodes, on the
plus side you get an additional boost in performance -- if
your app doesn't rely on signals to work, this is also a
great way to squeeze out a little more performance.

> 3. On the other hand, what do you get when you weight these by average
>    time per opcode?

Haven't tested this, but even by simply reordering the
cases according to the above stats you get a positive
response from pybench and pystone.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/