Interesting list() un-optimization

Terry Reedy tjreedy at udel.edu
Thu Mar 7 17:53:27 EST 2013


On 3/7/2013 3:41 PM, Wolfgang Maier wrote:
>
>>>>> Iterators do not generally have __len__ methods.
>>>>>
>>>>>>>> len(iter(range(10)))
>>>>> Traceback (most recent call last):
>>>>>    File "<stdin>", line 1, in <module>
>>>>> TypeError: object of type 'range_iterator' has no len()
>>>>
>>>> But iterators have a length hint method that are used for some
>>>> optimizations and preallocations, too.
>>>>
>>>>>>> i = iter(range(10))
>>>>>>> i.__length_hint__()
>>>> 10
>>>>
>>>> See http://www.python.org/dev/peps/pep-0424/
>>>
>
> very interesting (hadn't heard of it)! Just checked the PEP,
> then tested list()'s behavior, and it is just as described:
>
> class stupid(list):
> 	def __len__(self):
> 		print ('len() called')
> 		return NotImplemented
> 	
> 	def __length_hint__(self):
> 		print ('hint requested')
> 		l=iter(self).__length_hint__()
> 		print (l)
> 		return l
>
> a=stupid((1,2,3))
> len(d)
> ======>
>    len() called
>
>    Traceback (most recent call last):
>      File "<pyshell#79>", line 1, in <module>
>        len(d)
>    TypeError: an integer is required
>
> list(d)
> ======>
>    len() called
>    hint requested
>    3
>    [1, 2, 3]
>
> so list() first tries to call the iterable's __len__ method. If that raises a
> TypeError it falls back to __length_hint__ .
> What I still don't know is how the listiterator object's __length_hint__ works.
> Why, in this case, does it know that it has a length of 3 ? The PEP does not
> provide any hint how a reasonable hint could be calculated.

'How' depends on the iterator, but when it has an underlying concrete 
iterable of known length, it should be rather trivial as .__next__ will 
explicitly or implicitly use the count remaining in its operation. Part 
of the justification of adding __length_hint__ is that in many cases it 
is so easy. Iterators based on an iterator with a length_hint can just 
pass it along.

The list iterator might work with a C pointer to the next item and a 
countdown count initialized to the list length. The __next__ method 
might be something like this mixed C and Python pseudocode:

   if (countdown--) return *(current--);
   else raise StopIteration

(For non-C coders, postfix -- decrements value *after* retrieving it.)
Then __length_hint__ would just return countdown. Tuples would work the 
same way. Sets and dicts would need current-- elaborated to skip over 
empty hash table slots. Range iterators would add the step to current, 
instead of 1.

-- 
Terry Jan Reedy




More information about the Python-list mailing list