A Python 3000 Question

Wed Oct 31 19:58:05 EDT 2007

On Oct 31, 2007 6:02 PM, George Sakkis <george.sakkis at gmail.com> wrote:
> On Oct 31, 6:13 pm, Steven D'Aprano
>
> > What you have measured is a local optimization that is only useful when
> > you have a tight loop with lots of calls to the same len():
> >
> > Len = sequence.__len__
> > while Len() < 100000:
> >     foo(sequence)
>
> Exactly what timeit() does, a tight loop.
>
> > But what Neil is measuring is relevant for the more general case of
> > calling len() on arbitrary objects at arbitrary times:
> >
> > x = len(sequence)
> > y = 42 + len("xyz")
> > if len(records) == 12:
> >     print "twelve!"
> >
> > To get the benefit of the micro-optimization you suggested, you would
> > need to write this:
> >
> > Len = sequence.__len__
> > x = Len()
> > Len = "xyz".__len__
> > y = 42 + Len()
> > Len = records.__len__
> > if Len() == 12:
> >     print "twelve!"
> >
> > Not only is that an especially ugly, confusing piece of code, it is also
> > counter-productive as you still have to do the method lookup every single
> > time.
>
> The whole point of the optimization is avoiding 99999 attribute
> lookups of the timeit loop. For a single call both times are
> practically zero so the snippet above is just silly.
>
> > Taking the method lookup is an apple-and-oranges comparison. It is useful
> > to know for the times when you don't want an apple, you want an orange
> > (i.e. the tight loop example above).
> >
> > This is an apples-and-apples comparison:
> >
> > >>> import timeit
> > >>> timeit.Timer("Len = seq.__len__; Len()", "seq = range(12)").repeat()
> >
> > [1.3109469413757324, 0.7687380313873291, 0.55280089378356934]>>> timeit.Timer("len(seq)", "seq = range(12)").repeat()
> >

For the record, the apples to apples way is to STORE_FAST len, because
what this does is compare the speed of LOAD_GLOBAL(len) to a
LOAD_ATTR, a STORE_FAST, and a LOAD_FAST. len() is still faster on
builtins when you do it "right":

C:\>python -m timeit -s "s=range(12);l=type([]).__len__" "l(s)"
1000000 loops, best of 3: 0.258 usec per loop

C:\>python -m timeit -s "l = len;s=range(12)" "l(s)"
10000000 loops, best of 3: 0.0994 usec per loop

This is absolutely apples to apples because exactly the same bytecodes
are being timed.

However, you can make the first faster by using an already bound
method. This saves some runtime cost inside the __len__ method, and it
saves a LOAD_FAST in the call. It's still slower than len(), by the
barest fraction of a second (I ran these a few dozen time each and the
difference, while tiny, was consistent - __len__ was always just over
a tenth of a usec, len() was always just under it).

C:\>python -m timeit -s "s=range(12);l=s.__len__" "l()"
10000000 loops, best of 3: 0.114 usec per loop

> > [0.60111594200134277, 0.5977931022644043, 0.59935498237609863]
> >
> > The time to do the lookup and call the method is about 0.6 microseconds
> > whether you do it manually or let len() do it.
>
> That assumes you know the internals of len(). Perhaps it does
> something smarter for lists, or perhaps attribute lookup is faster at
> the C level. I don't know, and that's exactly the point, I don't need
> to know the internals to do the comparison of two callables.
>

Types implemented in C implement the sequence protocol as a struct of
function pointers, which len() derefs and calls directly, while
__len__ on the builtin is a Python boundmethod.

The speed difference will be reversed for types implemented in Python,
though. So now you know.

> > The logic behind the micro-optimization trick is to recognize when you
> > can avoid the cost of repeated method lookups by doing it once only. That
> > is not relevant to the question of whether len() should be a function or
> > a method.
>
> No disagreement here; I didn't bring up performance and it's not at
> all a reason I consider len() as a function to be a design mistake.
> What I dispute is Neil's assertion that "len as a builtin can be
> faster than len as an attribute" and bringing as evidence the timings
> that include attribute lookups. That statement is just wrong;
> comparing X to an attribute assumes you have the attribute in the
> first place, you don't need to look it up.
>

And I don't think it's a design mistake for all the other reasons
mentioned, not because it's 6 thousands of a millisecond faster in a
microbenchmark.