A Python 3000 Question

Wed Oct 31 18:13:46 EDT 2007

On Wed, 31 Oct 2007 13:14:41 +0000, George Sakkis wrote:

> On Oct 31, 8:44 am, Neil Cerutti <horp... at yahoo.com> wrote:
>> On 2007-10-30, George Sakkis <george.sak... at gmail.com> wrote:
>>
>>
>>
>> > On Oct 30, 11:25 am, Neil Cerutti <horp... at yahoo.com> wrote:
>> >> On 2007-10-30, Eduardo O. Padoan <eduardo.pad... at gmail.com> wrote:
>>
>> >> > This is a FAQ:
>> >> >http://effbot.org/pyfaq/why-does-python-use-methods-for-some-
function...
>>
>> >> Holy Airy Persiflage Batman!
>>
>> >> Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
>> >> (Intel)] on win32
>> >> Type "help", "copyright", "credits" or "license" for more
>> >> information.>>> import timeit
>> >> >>> timeit.Timer('len(seq)', 'seq = range(100)').timeit()
>> >> 0.20332271187463391
>> >> >>> timeit.Timer('seq.__len__()', 'seq = range(100)').timeit()
>>
>> >> 0.48545737364457864
>>
>> > Common mistake; try this instead:
>>
>> > timeit.Timer('seqlen()',
>> >              'seq = range(100); seqlen=seq.__len__').timeit()
>>
>> Why would I want to do that?
> 
> 
> To realize that __len__ is actually faster than len in this case; your
> second timing is dominated by the time to do attribute lookup.

What Neil has done is the correct thing to measure in this case.

What you have measured is a local optimization that is only useful when 
you have a tight loop with lots of calls to the same len():

Len = sequence.__len__
while Len() < 100000:
    foo(sequence)

But what Neil is measuring is relevant for the more general case of 
calling len() on arbitrary objects at arbitrary times:

x = len(sequence)
y = 42 + len("xyz")
if len(records) == 12:
    print "twelve!"

To get the benefit of the micro-optimization you suggested, you would 
need to write this:

Len = sequence.__len__
x = Len()
Len = "xyz".__len__
y = 42 + Len()
Len = records.__len__
if Len() == 12:
    print "twelve!"

Not only is that an especially ugly, confusing piece of code, it is also 
counter-productive as you still have to do the method lookup every single 
time.

Taking the method lookup is an apple-and-oranges comparison. It is useful 
to know for the times when you don't want an apple, you want an orange 
(i.e. the tight loop example above).

This is an apples-and-apples comparison:

>>> import timeit
>>> timeit.Timer("Len = seq.__len__; Len()", "seq = range(12)").repeat()
[1.3109469413757324, 0.7687380313873291, 0.55280089378356934]
>>> timeit.Timer("len(seq)", "seq = range(12)").repeat()
[0.60111594200134277, 0.5977931022644043, 0.59935498237609863]

The time to do the lookup and call the method is about 0.6 microseconds 
whether you do it manually or let len() do it.

The logic behind the micro-optimization trick is to recognize when you 
can avoid the cost of repeated method lookups by doing it once only. That 
is not relevant to the question of whether len() should be a function or 
a method.

-- 
Steven.