[Python-Dev] Re: SET_LINENO killer

Michael Hudson mwh@python.net
19 Aug 2002 10:39:05 +0100


Tim Peters <tim.one@comcast.net> writes:

> [Michael Hudson]
> > ...
> > This makes no sense; after you've commented out the trace stuff, the
> > only difference left is that the switch is smaller!
> 
> When things like this don't make sense, it just means we're naive <wink>.
> The eval loop overwhelms most optimizers via a crushing overload of "too
> many" variables and "too many" basic blocks connected via a complex
> topology, and compiler optimization phases are in the business of using
> (mostly) linear-time heuristics to solve exponential-time optimization
> problems.  IOW, the performance of the eval loop is as touchy as a
> heterosexual sailor coming off 2 years at sea, and there's no predicting
> what minor changes will do to speed.  This has been observed repeatedly by
> everyone who has tried to speed it, across many platforms, and across a
> decade of staring at it:  the eval loop is in unstable equilibrium on its
> best days.

I knew all this, but was still surprised by the magnitude of the slowdown.

> In the limit, the eval loop "should be" a little slower now under -O, just
> because we've added another test + taken-branch to the normal path.  From
> that POV, your
> 
> > FWIW gcc makes my patch a small win even with -O.
> 
> is as much "a mystery" as why MSVC 6 hates it.

No kidding.

I wonder if some of the slow comes from repeatedly hauling the
threadstate into the cache.  I guess wonderings like this are almost
exactly valueless.

> > Actually, there are some other changes, like always updating f->f_lasti,
> > and allocating 8 more bytes on the stack.  Does commenting out the
> > definition of instr_lb & instr_ub make any difference?
> 
> I'll try that on Tuesday, but don't hold your breath.  It could be that I
> can get back all the loss by declaring tstate volatile -- or doing any other
> random thing <wink>.
> 
> > ...
> > Does reading assembly give any clues?  Not that I'd really expect
> > anyone to read all of the main loop...
> 
> I will if it's important, but a good HW simulator is a better tool for this
> kind of thing, and in any case I doubt I can make enough time to do what
> would be needed to address this for real.

On linux there's cachegrind which comes with valgrind and might prove
helpful.  But that only runs on linux, and I'm not sure I want to
explain the linux mystery, as it might go away :)

> > I'm baffled.
> 
> Join the club -- we've held this invitation open for you for years <wink>.

Attempting PhD in mathematics is providing enough bafflement for this
schmuck, but thanks for the offer.

> > Perhaps you can put SET_LINENO back in for the Windows build
> > <1e-6 wink>.
> 
> If it's an unfortunate I-cache conflict among heavily-hit code addresses
> (something a good HW simulator can tell you), that could actually solve it!
> Then anything that manages to move one of the colliding code chunks to a
> different address could yield "a mysterious speedup".  These mysteries are
> only irritating when they work against you <wink>.

Well, quite.  Lets send Julian Seward an email asking him if he wants
to port valgrind to Windows <wink>.

Cheers,
M.

-- 
  surely, somewhere, somehow, in the history of computing, at least
  one manual has been written that you could at least remotely
  attempt to consider possibly glancing at.              -- Adam Rixey