Bytecode optimisation

Wed May 19 03:31:10 EDT 1999

Corran Webster wrote:

> In article <37420CBD.FF15564 at appliedbiometrics.com>,
> Christian Tismer  <tismer at appliedbiometrics.com> wrote:
> >
> >Corran Webster wrote:
> >>
> >> Or as I mentioned in my reply to Mark-Andre Lemburg, there may be a case
> >> for an optimisation level which assumes sanity for many standard things,
> >> so that
> >>
> >> for x in range(1000):
> >>   for y in range(1000):
> >>     do_stuff()
> >>
> >> could assume that do_stuff() doesn't change the meaning of range and so
> >> this could be optimised to:
> >>
> >> r = range(1000)
> >> for x in r:
> >>   for y in r:
> >>     do_stuff()
> >
> >Looks promising in the first place. But you quickly realize
> >that this is no real optimization since you will do it
> >by hand, anyway, or use xrange.
> >And knowing if something is an xrange on every thousand's
> >run through the cycle, what does it help?

...but if we know that range is a standard built-in, and if we can check once
that "range" hasn't been redefined, then we can replace the call to range,
involving lookup and py->c argument parsing, with a premade list, can't we?
that would be a speed win, wouldn't it? Just spouting my mouth off here...

It seems to me, and looks like the general consensus, that the big speed
blockage in python optimisation is the fact that so many things are
unpredictable. But many of the built-in functions can be checked against the
unchangeable C code's address and have their their results placed as constants
in the function; or just the functions placed as constants to avoid the lookup.
(yes, I know lookup is fast. Like I said, just spouting ideas that everyone has
probably already had..)

> And maybe it'll turn out that it's not worth the effort.
>
> I'm really doing this for fun and as a project so I can learn a bit more
> about Python internals.  If it bears fruit, great!  If not, well, I'll
> have learned a fair bit in the process.
>
> >Guys, you are at the wrong end. This is another complete waste
> >of time. The whole interpreter overhead is bound to an average
> >of about 30 percent, when everything is C-code. I'd wonder if
> >bytecode optimization would give more than 15 percent on
> >average, when semantics are not changed.
> >
> >Do little enhancements with moderate effort.
> >And check the optimization results of last year's
> >conference. It's overall pretty good work but disencouraging
> >results.
> >
> >You need to invent special new bytecodes for the things
> >you want to optimize. If you can find and implement these,
> >then it makes sense to optimize with type inference.
>
> Point taken.
>
> At the moment I'm exploring and playing.  Maybe it'll all be for nothing,
> but I'm really just discovering what's possible with Python in it's
> present form.
>
> >no-longer-optimizing-for-less-than-50-percent-ly y'rs - chris
>
> will-be-happy-if-the-optimiser-takes-twice-as-long-to-run-as-the-time-
> -it-saves-ly y'rs,
> Corran

The thread on "swallow" seems to have dried up; maybe they've gone out to
experiment with it? In any case, these ideas and the ideas from that thread are
butting head-on. If a satisfactory way of finding "static" functions and
type-constant variables, then that markup could be used both to optimise the
python byte-codes and to generate native code directly. Depending on the speed
of the system and how often the optomisation is done, this could be done
automatically in the interpreter.

It may be nescessary to keep two or more copies of a function around (if you
want both speed and flexibility).. one optimized to flash powder in native
code, one in the original python bytecode. When doing it's lookups, a quick
second check for a) known types and b) if the types are known, then if they
match the types assumed in the optimisation. If the check passes, it runs the
optimised version. If not, it runs the standard version.

(this might mean that only "swallowed" code can call other "swallowed" code
without wrappers. I don't think this is a big deal?)

I don't know much about bytecode interpreters and their design/implementation -
perhaps someone with experience could supply pointers to relevant articles and
information sources? preferably web-based, not all of us have a
university-class library to borrow from :).

Now I'm going to go look through some of the standard libraries to see how much
could be gained by compiling away just the static parts. Pre-Look, i'm assuming
either one of two things:
    a) Lots of speedup possible, because the libraries often implement very
basic things that are done repeatedly and without much "semantic dynamism"
    or, b) Little speedup because the libraries are often wrappers around
already-fast C modules specifically to ADD "semantic dynamism"

so, let's see?

(PS, what's the proper term for "semantic dynamism"?)

-Jim.