Building CPython

Mark Lawrence breamoreboy at yahoo.co.uk
Fri May 15 22:17:27 EDT 2015


On 16/05/2015 02:55, Steven D'Aprano wrote:
> On Sat, 16 May 2015 09:27 am, Mark Lawrence wrote:
>
>> On 15/05/2015 23:44, Marko Rauhamaa wrote:
>>> BartC <bc at freeuk.com>:
>>>
>>>> What /is/ a method lookup? Is it when you have this:
>>>>
>>>>    A.B()
>>>>
>>>> and need to find whether the expression A (or its class or type) has a
>>>> name B associated with it? (And it then needs to check whether B is
>>>> something that can be called.)
>>>>
>>>> If so, does that have to be done using Python's Dict mechanism? (Ie.
>>>> searching for a key 'B' by name and seeing if the object associated
>>>> with it is a method. That does not sound efficient.)
>
> It's not as inefficient as you may think. Dicts are hash tables, and hash
> tables are a standard computer science data structure for performing very
> fast searches at constant (or near constant) time.
>
> The dict is basically an array of pointers to (key, value). To look a name
> up in the dict, you hash the string which gives you an index into the
> array, then look at that position. If it is blank, you know there is no
> match. If it points to a string, you compare that string to your string. If
> they are equal, then you have a match. If they aren't equal, you have a
> collision, and you have to look elsewhere (details differ) but typically
> you don't end up looking in more than one or two positions. So all pretty
> fast, and close enough to constant time.
>
> To speed things up even further, I think that the hash value is cached with
> the string, so it only needs to be calculated the first time.
>
>
>
>>> That is a general feature among high-level programming languages. In
>>> Python, it is even more complicated:
>>>
>>>    * first the object's dict is looked up for the method name
>>>
>>>    * if the method is not found (it usually isn't), the dict of the
>>>      object's class is consulted
>>>
>>>    * if the method is found (it usually is), a function object is
>>>      instantiated that delegates to the class's method and embeds a "self"
>>>      reference to the object to the call
>
> It's the other way around. The function object already exists: you created
> it by writing `def method(self, *args): ... ` inside the class body. def
> always makes a function. It's the *method* object which is created on the
> fly, delegating to the function.
>
>
>
>>> IOW, two dict lookups plus an object construction for each method call.
>>>
>>>
>>> Marko
>>>
>>
>> As a picture paints a thousand words is anybody aware of a site or sites
>> that show this diagramatically, as I think I and possibly others would
>> find it far easier to grasp.
>
> No I'm not aware of any such site, but I can try to make it more obvious
> with an example.
>
> Suppose we have a hierarchy of classes, starting from the root of the
> hierarchy (object) to a specific instance:
>
> class Animal(object):
>      pass
>
> class Mammal(Animal):
>      pass
>
> class Dog(Mammal):
>      def bark(self): ...
>
> laddy = Dog()
>
>
> We then look up a method:
>
> laddy.bark()
>
> In a non-dynamic language like Java, the compiler knows exactly where bark
> is defined ("in the Dog class") and can call it directly. In dynamic
> languages like Python, the compiler can't be sure that bark hasn't been
> shadowed or overridden at runtime, so it has to search for the first match
> found. Simplified:
>
> * Does laddy.__dict__ contain the key "bark"? If so, we have a match.
>
> * For each class in the MRO (Method Resolution Order), namely
>    [Dog, Mammal, Animal, object], does the class __dict__ contain the
>    key "bark"? If so, we have a match.
>
> * Do any of those classes in the MRO have a __getattr__ method? If
>    so, then try calling __getattr__("bark") and see if it returns
>    a match.
>
> * Give up and raise AttributeError.
>
> [Aside: some of the things I haven't covered: __slots__, __getattribute__,
> how metaclasses can affect this.]
>
> In the case of laddy.bark, the matching attribute is found as
> Dog.__dict__['bark']:
>
>      temp = Dog.__bark__['bark']  # laddy.bark, is a function
>
> At this point, the descriptor protocol applies. You can ignore this part if
> you like, and just pretend that laddy.bark returns a method instead of a
> function, but if you want to know what actually happens in all it's gory
> details, it is something like this (again, simplified):
>
> * Does the attribute have a __get__ method? If not, then we just
>    use the object as-is, with no changes.
>
> * But if it does have a __get__ method, then it is a descriptor and
>    we call the __get__ method to get the object we actually use.
>
>
> Since functions are descriptors, we get this:
>
>      temp = temp.__get__(laddy, Dog)  # returns a method object
>
> and finally we can call the method:
>
>      temp()  # laddy.bark()
>
>
> None of these individual operations are particularly expensive, nor are
> there a lot of them. For a typical instance, the MRO usually contains only
> two or three classes, and __dict__ lookups are fast. Nevertheless, even
> though each method lookup is individually *almost* as fast as the sort of
> pre-compiled all-but-instantaneous access which Java can do, it all adds
> up. So in Java, a long chain of dots:
>
>      foo.bar.baz.foobar.spam.eggs.cheese
>
> can be resolved at compile-time, and takes no more time than
>
>      foo.cheese
>
> but in Python's case it has to be resolved at run-time, so if you care about
> speed, you should try to avoid long chains of dots in performance critical
> loops. E.g. instead of:
>
>      for x in sequence:
>          foo.bar.baz.foobar.spam.eggs.cheese(x)
>
> you can write:
>
>      cheese = foo.bar.baz.foobar.spam.eggs.cheese
>      for x in sequence:
>          cheese(x)
>

Thanks for taking the time over this, I'll study it later today after 
hopefully a good night's sleep :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence




More information about the Python-list mailing list