Interesting performance question

Sun Sep 29 14:42:44 EDT 2019

On 9/29/2019 9:45 AM, Eryk Sun wrote:
> On 9/29/19, Anthony Flury via Python-list <python-list at python.org> wrote:
>>
>> Using python 3.6 building a tuple like this :
>>
>> my_tuple = tuple([x*x for x in range(1,1000)])
> 
> The list comprehension is implemented internally as a function that
> builds and returns the list. This function creates an empty list and
> loops over the range iterator to evaluate the expression and append
> the result. Appending to a list uses an over-allocation strategy to
> efficiently grow the list.

The internal function can use, and I believe, does use
 >>> iter(range(1,1000)).__length_hint__()
999
to directly allocate space for at least 999 items and skip growing from 
an empty list.

> Finally, the list is passed to the tuple
> constructor, which can efficiently and quickly create a tuple from the
> list because it's simply copying a PyObject * array in C.
> 
>>       my_tuple = tuple(x*x for x in range(1,1000))
> 
> In this case a generator is created instead of a list. This is passed
> to the tuple constructor, which iterates the generator. There's no
> __length_hint__() for a generator, so it starts with a length 10
> tuple. The tuple grows with an over-allocation rule, and at the end
> it's resized to the actual length.
> 
> I expect the generator-based expression to be a bit more expensive.
> Iterating a generator requires resuming evaluation of the code object
> up to a yield, which suspends evaluation. For the list comprehension,
> the loop that builds the list executes continuously, without an
> interruption to yield a value in each pass.
> 


-- 
Terry Jan Reedy