Why is Python so slow ?- revisited.

Wed Jun 21 07:45:56 EDT 2000

Thomas Wouters <thomas at xs4all.net> wrote:
[function method caching]

> I know next to nothing about Smalltalk, let alone how it optimizes, but I
> can tell you why it would be complicated in Python ;) Lets take a simple
> example:

> def spam(x, y):
>     if x in range(1, y):
>         return 1
>     if x in range(y, 1):
>         return 1
>     return 0

> Might not seem very useful, but it's just an example ;) This can easily be
> optimized, right ? We lookup 'range', it's a builtin, we store the address,
> and call it two times. Well, no. We dont *know* range a is a builtin, it
> could be a global that points elsewhere. If the 'spam' function contained a
> 'from module import *' or 'exec' statement, it could even be a local.

> So, we have to check if range is still the same function. Which means we
> have to look it up and compare identity -- and we've already lost any
> advantage the cache could give us. Or we could place a marker on the
> __builtin__ module, and invalidate the cached entry of 'range' whenver
> 'range' is changed. But then we also have to add a marker to the global
> namespace, because something can *add* a range there.

You could do some analysis of local and global namespace, and see 
that it is possible that 'range' is altered. If such a possibility
exists at all (assignment to the name, or by from import), then range
is simply never cached at all.

If 'from module import *' is used, the user deserves what they
asked for and *nothing* in that module is cached. I'm not sure
how 'exec' can be used to screw things up, but let's say we can apply
the same rule to that. Though that is possibly more tricky..

Something along the lines of this approach could work, I think? The
approach would be to forbid the caching of certain names if it's
detected certain things could happen to them that would make them
uncachable.

> And don't think it can't happen:
[snip things with range() that could happen but almost never *actually*
happen]

[__getattr__]

If __getattr__ is defined in a class, we might still be able to cache
the methods that do exist. Anything that exists only due to
__getattr__ isn't cached, obviously.

> Also, we need a way to invalidate the cached entry of egg.spam if any of egg or
> any of its parent up to the one defining 'spam', have something done to
> their 'spam' attribute.

If the possibility exist that a method is changed, don't cache that
method. Of course setattr() will be a major problem; very hard to determine
whether setattr() won't affect some class at some point. And you won't
know what class, or what name. Hmm.

> And that means than whenever you change an attribute
> on a class or instance, you need to traverse the entire tree of subclasses
> and instances (up to one that 'masks' the change) looking for cached entries
> and invalidating them.

Far cheaper would be simply to invalidate the entire cache in such
a case. That is, if class attributes are changed. I'm not sure why
you'd need to invalidate things when you change something on an instance,
though? (except the thing that's changed itself)

> And of course, if you do something like:

>   egg.ham.viking.spam.plate.drop()

> you have to consider *each lookup* cached ;-P I don't think lookups are
> *that* expensive. I'm not sure what the penalty for a failed lookup is, I
> seem to recall something about a successful lookup being relatively cheap,
> compared to a failed one, and a 2nd lookup is not likely to fail ;)

I don't seem to understand this bit. What does it mean, considering
each lookup cached? Could you explain?

[snip]
>> Hmm. Ok. I think. Maybe. Why it has to be do each time one accesses the
>> *instance*'s method I find a tad confusing. I mean ok, you change the
>> class def, you'll want to update the instances. But surely it's ok to
>> make class updates more expensive than method access?

> Yes, but this works, and caching & invalidation doesn't :-) It would be no
> little ammount of code to add it, even if you discard the ideas about
> invalidation (say, if you add the new code under a -OOO option, and tell
> people not to f*** around with base classes after instances have been
> created, which I doubt Guido or any of the other core developers would
> allow.)

I think you need the ideas of invalidation. I think you also need the
idea of "don't ever cache this" (or at least "don't cache this until I
tell you to"). I think that approach can help with some of the tricky bits.

Regards,

Martijn
-- 
History of the 20th Century: WW1, WW2, WW3?
No, WWW -- Could we be going in the right direction?