Bytecode optimisation

Tue May 18 19:36:10 EDT 1999

In article <slrn7k30l4.if8.jepler at craie.inetnebr.com>,
Jeff Epler <jepler at inetnebr.com> wrote:
>On 17 May 1999 23:13:16 GMT, Corran Webster
> <cwebster at math.tamu.edu> wrote:
>>  * More complex optimisation is certainly possible, although a lot more 
>>    work would be required.  I've made a first stab at block-level analysis
>>    and it seems feasible.  In particular it would be nice to detect and 
>>    shift loop invariant code.
>>
>>  * If type information could be guessed, then more might be able to be 
>>    squeezed.  At the moment I can't simplify something like 0 * x, 
>>    because x could be anything, and could call an arbitrary function to 
>>    perform the operation.  If it was known that x was an integer, then we 
>>    could safely replace it with 0.  Assert statements could be used for
>>    this sort of thing.
>
>How can you tell if code is loop invariant without knowing the types of
>variables?  For instance:
>	for i in range(5):
>		j=j+a[0]*a[i]
>depending on the type of "a", "a[0]" might change each time it is
>calculated and not be equivalent to:
>	t=a[0]
>	for i in range(5):
>		r=r+t*a[i]
>
>I don't think you can do that in general without inferring types.

There are a few situations where it may be possible to get some minor gains
from loops simply by examining the bytecode.  For example, a for loop used
to just do something n times, without ever using the iteration variable
like:

for i in range(1000):
  do something which doesn't use i
do something else which doesn't use i

the bytecode does a redundant STORE_LOCAL i which could be replaced with a
POP_TOP.  A minor gain, but probably worth doing.

But by and large, you are right.  One concept I have been playing with is
marking some local variables as being "globally tainted" - in other words
those whose value can be changed by unknown side effects.  The behavior
on non tainted variables is more predictable.

In your example, if a was not globally tainted (for example, a list
built locally), then at least a[0] could be guaranteed constant in the
loop.  Note that in this case it wouldn't matter if a was a dict, list
or tuple; but a class instance with a __getitem__ would be globally
tainted because it executes code outside the present context.

>Perhaps you could let the user (coder) promise the optimizer that it can
>treat a particular variable as a reference to
>	. one of the builtin types "int", "string", "list", "tuple"
>	. a numeric type, suitable for CSE elimination, loop hoisting,
>	  and expression reorganization
>	. an immutable type, suitable for CSE elimination and loop hoisting
>	. a const function which always returns the same value for given
>	  arguments
>	. other things that I haven't thought of
>then you'd write something like:
>	def example(a):
>		r=0
>		for i in range(5):
>			r=r+a[0]*a[i]
>		return r
>	example_opt = optimize(example, a=Immutable, range=ConstFunction)

This is a possibility, but it may cause extremely bad behaviour if the
promise is broken at a later date (possibly by someone other than the
original coder).  But there may be some optimisations which are equivalent
to hand optimisations like this, and which are safe to do.

In fact thinking a bit more about this, if specifying types requires
too much effort, then the coder is better off writing the Python in a
more optimised form beforehand - it seems that the optimiser should
have a "paranoid" mode where nothing which might change the way the
code is executed is allowed (which will rule out a lot of code hoisting
like this), and a "smart" mode where it does what the coder would
probably do if they were optimising by hand (a "do what I mean, not
what I say" mode).

>Just some random thoughts..

Much appreciated.

Corran