Why is Python so slow ?- revisited.

Tue Jun 20 08:32:56 EDT 2000

On Mon, Jun 19, 2000 at 08:47:09PM -0400, Bijan Parsia wrote:
> Thomas Wouters <thomas at xs4all.net> wrote:

[ Python's lack of an instruction/function cache ]

> > Indeed. Such a cache would have to be invalidated whenever the function
> > object or one of the objects used to lookup the function, was altered (or
> > rather, 'touched'.) The dynamic nature of Python makes this quite
> > complicated, and probably quite expensive.

> I'd love to hear more about why this would be the case. (Mere
> "dynamicness" can't be it: Smalltalk is a s dynamic and most
> implementations make quite effective use of various sorts of cache. So,
> if your contention is correct, there must be some *special* aspect of
> Python's dynamic nature that makes it' *especially* resistent. Note that
> I am *very* skeptical about the existence of such a special aspect, but
> I am quite willing to be enlightened! Merely being "more dynamic" or "so
> very dynamic" which is what I usual read, doesn't cut any mustard.

I know next to nothing about Smalltalk, let alone how it optimizes, but I
can tell you why it would be complicated in Python ;) Lets take a simple
example:

def spam(x, y):
    if x in range(1, y):
        return 1
    if x in range(y, 1):
        return 1
    return 0

Might not seem very useful, but it's just an example ;) This can easily be
optimized, right ? We lookup 'range', it's a builtin, we store the address,
and call it two times. Well, no. We dont *know* range a is a builtin, it
could be a global that points elsewhere. If the 'spam' function contained a
'from module import *' or 'exec' statement, it could even be a local.

So, we have to check if range is still the same function. Which means we
have to look it up and compare identity -- and we've already lost any
advantage the cache could give us. Or we could place a marker on the
__builtin__ module, and invalidate the cached entry of 'range' whenver
'range' is changed. But then we also have to add a marker to the global
namespace, because something can *add* a range there.

And don't think it can't happen: 'range()' could be a function that modifies
the global or builtin namespace. Or it could be a function that deletes
references to a few objects (not that uncommon an operation, really.) And
those objects could then be destroyed (no more references), and then those
objects could trigger destruction of other objects, and one of those objects
could call a function on some module or other that modifies the original
global or builtin namespace. 

Another example, the simple case of:

def spam(egg):
    egg.spam()
    egg.spam()

If egg or a parent class of egg does not define a spam() method but does
define a __getattr__, this could return two entirely different things. So,
in caching egg.spam, we need to make sure it's been located 'statically'.
(This can be read as 'without __getattr__', or as 'analyzed __getattr__ and
determined each call with the same arguments would return the same object')

Also, we need a way to invalidate the cached entry of egg.spam if any of egg or
any of its parent up to the one defining 'spam', have something done to
their 'spam' attribute. And that means than whenever you change an attribute
on a class or instance, you need to traverse the entire tree of subclasses
and instances (up to one that 'masks' the change) looking for cached entries
and invalidating them.

And of course, if you do something like:

  egg.ham.viking.spam.plate.drop()

you have to consider *each lookup* cached ;-P I don't think lookups are
*that* expensive. I'm not sure what the penalty for a failed lookup is, I
seem to recall something about a successful lookup being relatively cheap,
compared to a failed one, and a 2nd lookup is not likely to fail ;)

> Indeed, as my bet is that the majority of the time, in the majority of
> code, things are quite "hands off" (i.e., no touching :)), I would be
> surprised that this would prove a problem. In inner loops, indeed, it
> could be quite effective, and would be no different than *manually*
> copying the function to the local--a move, I hope you agree, can provide
> a substantial speedup.

I've measured about 10% speed increase (according to pybench) in method
calls, by storing each method in the instance-dict after it was created. It
also seemed to have the same effect on the TryExcept test, but that's
probably because it uses a lot of methods.  However, at the same time, some
other tests decreased in performance about as much, ending up actually 0.76%
slower than before ;)

The speedup can probably be a lot more, if you do (compile time) analysis of
howmany times a given method is called, and caching only those. Or, of
course, if you *manually* cache the function to the instance-dict or the
local namespace (even better -- one lookup less).

> I confess to not being an expert in these matters, but I don't know I
> understand why the *whole* cache would have to be invalidated if one
> function object were "touched". Er..in mere other words...More please!

I was a bit careless (again ;) in my wording. I meant to say that the
individual cached entry has to be invalidated.

> > What's more, currently 'bound methods' (methods which have a 'self' to which
> > they apply) are created on demand -- the unbound method exists on the class,
> > but it has to be 'bound' to an instance before it can be used. This is done
> > each time you access the method, though CPython is being smart and you have
> > to try hard to actually see that ;-)

> Hmm. Ok. I think. Maybe. Why it has to be do each time one accesses the
> *instance*'s method I find a tad confusing. I mean ok, you change the
> class def, you'll want to update the instances. But surely it's ok to
> make class updates more expensive than method access?

Yes, but this works, and caching & invalidation doesn't :-) It would be no
little ammount of code to add it, even if you discard the ideas about
invalidation (say, if you add the new code under a -OOO option, and tell
people not to f*** around with base classes after instances have been
created, which I doubt Guido or any of the other core developers would
allow.)

> Ewww! :) But *why* do this? walk hasn't changed. It's surely
> determinable that it hasn't changed over the course of x,y=(cleese.walk,
> cleese.walk), yes?

Yes, by comparing identity of the retrieved objects, but then it's already
too late :-) "a,b=c.walk,c.walk" gets compiled roughly in this:

          6 LOAD_FAST                0 (c)
          9 LOAD_ATTR                1 (walk)
         12 LOAD_FAST                0 (c)
         15 LOAD_ATTR                1 (walk)
         18 BUILD_TUPLE              2
         21 UNPACK_TUPLE             2
         24 STORE_FAST               1 (a)
         27 STORE_FAST               2 (b)

As you see, different bytecodes. The bytecodes have trouble seeing what
previous bytecodes were doing, so you'd have to either recognize this in the
compiler and generate alternative bytecodes, or rewrite the bytecode
interpreter to do caching on *all* lookups (of a certain type.) And see
above about that problem ;)

So, to the casual reader it might seem obvious that Cleese's walk doesn't
change, but the compiler doesn't know *anything* about Cleese, and though
the interpreter does know everything about Cleese, it doesn't know that
Cleese has already been queried about his walk method. And adding the
knowledge to either (or better (for optimization), both) is a major project.

Hopefully, now that a couple of the Big Brains can devote their energy to
the furtherance of Python full-time, they can devote some of that in this
direction ;) But I can imagine there are more important things than this
potential 10% (or so) speedup.

So, now for the DISCLAIMER: I'm just a hobbyist. I don't know *that* much
about Python (yet ;), I've only been mucking around in its internals for a
couple of weeks. I also have no experience with compilers, cache-mechanisms
or optimizations other than a few good books on the subject (parts of the
Dragon book, O'Reilly's High Performance Computing, and books about the
internals of computers/compilers in general, like those of Tanenbaum and v/d
Linden.) So, please, do not take my words for unbendable truth, for even if
they are true, they're likely flexible beyond belief ;)

laywer-like-ly yr's,
-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!