[pypy-dev] Re: psyco in Python was: Minimal Python project

Fri Jan 17 17:24:02 CET 2003

On Fri, 17 Jan 2003, Edward K. Ream wrote:

> Hi Michael,
> 
> I'm going to copy this to pypy-dev so that other people can see this
> discussion.

OK.

> > > > > I just don't see how to implement psyco in Python in a production
> > > > > package without using the C interpreter as a bootstrap!  And yes, I
> > > > > suspect that a C language version of psyco will be faster than psyco
> > > > > in Python.
> 
> > > > Even after you've run psyco over itself?
> > >
> > > Yes, maybe even then.  C compilers should do better _on the same
> > > algorithm_ on hand-written code than can any GIT.
> 
> > Weeeeellll, this is the thing that gets me jumping-up-and-down-excited
> > about things like psyco: pysco has MORE information than the C compiler to
> > work with: while the C compiler only knows the algorithm to run, psyco
> > knows both the algorithm *and the data it is to run on*.  Obviously, this
> > approach only wins if what you do next resembles what you did last, but I
> > think CPU caches have proved there is some mileage in that idea :)
> 
> I don't think CPU caches have much bearing on this discussion.  Caches speed
> up frequently used values.  The problems here are not the same.

That wasn't my point.  CPU caches derive their benefit (in part) from the 
assumption that what a program did recently, it will do again.

psyco's specializations derive their benefit from the same assumption.

Also, I'm wasn't so much talking about psyco-as-is, more psyco-as-may-be, 
which is one of the reasons I'm trying to stop posting stuff like this...

> The more I study psyco, the more I am convinced that there is _no way_ that
> psyco will ever produce better code than a C compiler.  In brief, my reasons
> are as follows:
> 
> 1. psyco has, in fact, no better information than does a C compiler.
> Remember that the "values" that psyco specializes upon are in fact
> _interpreter_ vars, so knowing those values is equivalent to knowing the
> types of operands.  This does indeed allow psyco to emit machine code that
> is better than the equivalent C interp generic code.  In fact, though, most
> of what psyco is doing is quite routine: replacing bytecode by assembly
> language.  This will speed up the code a _lot_, but it is doing just like a
> C compiler does.

You've clearly spent more time looking at pysco lately than me: I don't 
understand the first half of this paragraph!  Please *don't* feel obliged 
to educate me, I wouldn't have enough time to do anything with the new 
knowledge...

> 2. I have looked again at the code generators of the optimizing C compiler I
> wrote about 10 years ago.  I can find no instance where knowing the value
> of an operand, rather than just its type, would produce better code. 

?  Surely in compiling

if (x < 10) {
    ...
} else {
    ...
}

knowing that x is less than 10 only 5% of the time would help?  I'm pretty 
sure there are compilers which allow you to hint the outcome of any given 
test, so there must be *some* use for such data.

> Yes, my compiler did constant folding (which eliminates code rather than
> improves it), but I do not believe doing constant folding at runtime is
> going to improve matters much, and is in fact very likely to slow down
> the running times of typical programs, for reasons given below.

It seems constant folding isn't much of an issue in Python, any way.

> 3. It's easy to get seduced by the "psyco has more knowledge" mantra.  In
> fact, one mustn't lose track of the obstacles psyco faces in outperforming C
> compilers.

I sincerely hope I never gave the idea that this stuff would be easy.

> First, and very important, everything that psyco does must be done at
> run time. 

Well, yeah.

> This means that even very effective optimizations like peephole
> optimizations become dubious for psyco.  Second, and also very
> important, is that fact that psyco (or rather the code that psyco
> produces) must determine whether the code that psyco has already
> produced is suitable for the task at hand.

This is a more subtle issue.

OTOH, the seem to be benefits in this kind of caching from algorithms that 
people implement IN SILICON.  I'm thinking of things like the P4's trace 
cache here.  It staggers me that that thing does enough to help, but I 
gather it does.

> The "Structure of Psyco" paper says this:  "Dispatching is the set of
> techniques used to store a trace of the compiled code buffers and _find if
> one already matches the current compiler state_" (emphasis mine)  In other
> words, it's not enough to specialize code!  Psyco must determine whether the
> types used to compile a function match the types presently given to psyco.
> This searching is done in the "compatible" routine, and no matter how it is
> done is _pure overhead_ compared to the code generated by a C compiler.

Well, hopefully you can hoist these choices of specialization out of any 
core loops.  80/20 rules and so on.  It seems fairly likely that psyco 
will be of most benefit when the dispatcher is invoked least often, i.e. 
psyco should work with "fairly" large lumps of code.  Though of course 
there are trade-offs here, as everywhere.

> As I see it, both these issues are "gotchas".  Neither can be eliminated
> completely, and each introduces overhead that will make the code emitted by
> psyco clearly inferior to the code produced by any decent C compiler. In
> particular, the more psyco specializes on runtime values of operands the
> bigger the code bloat and the more severe the problem of finding out what
> already-compiled code to use.

Yes.  Fun with the I-cache, too.

> 4. I have seen _nothing_ in psyco's algorithms or the code emitted by psyco
> that leads me to believe that psyco is "doing enough work" to outperform a C
> compiler.

Sorry, OF COURSE it's not doing enough work NOW.  If it is from my 
postings that you have got the suggestion that anyone thinks that pysco 
can beat a decent C compiler today, I am sincerely sorry, and apologise to 
you and anyone else who might have gotten the same impression.

> When I was a mathematics graduate student 30 years ago one of my professors
> made the remark that a mathematician need to develop a feel for the level of

Hey, I'm a mathematics graduate student *now*!

> work required to do a new proof.  Deep insights are not going to come from
> elementary methods, 

Hmm.

> and there is no use in using deep methods to prove elementary results.

More hmm.

Not sure my disagreements with those statements are relavent here, though.

[...]

> 5. Another way to see that psyco isn't really doing much is the following.
> Basically, psyco emits calls to the same runtime helpers that the C
> interpreter does.  Yes, psyco may improve the calling sequences to these
> helpers, but it does not eliminate calls to those helpers.  

I thought the goal of this project was to reduce the number of these 
helpers...

> But a C compiler will never do worse that the code in the helpers, and
> will often do _much_ better.
> 
> I challenge this group to provide even one example of Python code, for which
> psyco will emit better code than that produced by a C compiler on a
> transliteration of that code into C.

I am confident that none such exists.

> I believe no such counter-example will ever be found.

For today's psyco, yes.  For tomorrow, who knows?

It's not like C is an ideally designed language for writing high
performance applications in (here I'm thinking of things like the need for 
the restrict keyword in C99).

[...]

> P.S.  I trust that this group will interpret my remarks as constructive.  I
> believe it is important at the beginning of any project to have proper goals
> and expectations.  Make no mistake: I believe this project has the potential
> to be extremely valuable, even if my arguments are correct.  And I would
> indeed be happy to be _proved_ wrong.  

I also hope that my comments don't produce undue enthusiasm.  We don't 
have a single line of code yet!

[...]

Something else, to finish.  Do you know of HP's dynamo project?

   http://www.arstechnica.com/reviews/1q00/dynamo/dynamo-1.html

is *fascinating* reading.  They were investigating issues of dynamic 
binary translation and started just by "translating" PA-RISC code to 
PA-RISC code.  Some optimizations later, and they noticed that the 
"translated" code ran sometimes ran 20% faster than the original!  
Basically because of trace caches.

Cheers,
M.